Contributor(s): Than Grove, Bill McGrath, Steven Weinberger
When there is a bulk of bibliographic data to be entered into SPT, it can be imported en masse into SPT through the Administration tool. However, to do so, the data must be arranged in the proper order. THL has written a series of macros embedded in an Excel document to assist in the importation of bibliographic data into SPT. This page describes how that is done, assuming that the data has already been entered into the computer or downloaded from a larger database.
Note for THL staff: as of December 2009, the import function only works through the interface on dev (and possibly on staging but not sure).
The data map describes the correspondences between the fields in the original data (the column names in the “Bibl Data” or original data worksheet) and the fields in SPT (represented by the “SPT Data” worksheet). To do this, the columns in the original data worksheet must be labeled, that is, the first row of each column must contain the name of that field, as with the SPT Data sheet. Assuming that this is the case, one can use the macro to create a data map:
The data map sheet has three columns. The first column lists all the SPT fields. The second column has what the macro has chosen as the corresponding field in the original data based on similarity of field name. The third column contains the field (column) number of the original data field that corresponds to this SPT field. (The field number for the SPT field is the Row number minus 1.)
The routine will highlight fields that have been matched twice by adding a background color to the cell of both matches. Matching colored cells are the two fields that match the same SPT field. Also, after the list of fields, the macro will list unmatched SPT and original data fields. Unmatched SPT fields will have “------” in the second and third columns. Any conflicts or omissions should be resolved at this point, though not all the data in the original data necessarily needs to be imported, and every SPT data field does not necessarily need a value, except for the important one such as Title, Category, etc. A macro has been created to eliminate cell coloring by pressing Shift+Ctrl+z. Make sure that there is only a single match for every SPT field and that all the desired fields in the original data are accounted for.
Converting the original data into SPT format involves two step: rearranging the original data columns into the SPT order and then preparing the data for upload to the SPT database. Both these tasks are dealt with by a single macro that converts the data and exports it to a Microsoft Word document, saving it as a text only file. This is necessary because exporting data from Excel sometimes adds unnecessary quotation markings into string fields. To be sure that the Excel and Word interface is activeated, do the following:
Then run the macro:
The macro will then create the new sheet and copy the original data columns over to the new sheet in the SPT order. When this is done, it will ask you to verify the name of the “Classifications” column/field. It will then proceed to clean up the data for SPT import and convert diacritics to HTML entities. In the end, it will open a Microsoft Word document, copy the resulting data into it, and ask you if it can save the document, showing its name
Before importing this resulting document into SPT, check it in a text editor (I use jEdit) to make sure that it is complete. Make sure the text editor can show you line numbers. Each new line should begin with the name of a resource. If there are multiple lines with the name of the same resource, that's o.k. because that's how SPT records data records with multiple entries in a field. There can be 2 or more lines that are nearly identical. But, if there is an obvious break between lines so that the first field in the second line is not the title of a resource, this should be corrected. Also, if there is a blank line at the end of the document, this should be deleted. Although Tibetan unicode fonts appear as ??? in notepad, they can be added and saved using jEdit. If entering data using Tibetan fonts, make sure they appear in jEdit and they will import into SPT. The same is true for diacritics; they will appear as character entities in the tab-delineated document, but can be changed using the search-replace function in jEdit. Finally, due to the recent name change of our website, you will need to change the name of the resource "THDL Resource" to "THL Resource." The new system uses this new name and will not recognize "THDL Resource." This must be done after the Excel document has been converted into a tab-delineated text document, however, in order for the Excel macro to convert successfully.
Importing to SPT is done in the following way:
SPT will give you an error if the data is not properly formatted and it will tell you the line number where the problem is. One needs to correct this and go through the process of re-importation again. If the error occurs somewhere in the middle of importation, then when one starts the import again, SPT will ignore those records already imported and will not import them twice. One puzzling error that sometimes shows up is “Duplicate Control Names”. This refers to the case when there are two control names (e.g., the names of authors, translators, editor). Errors may also occur if the text document is not encoded correctly. Make sure that the encoding is UTF-8 (shown in the lower right-hand corner of jEdit) and not UTF-8Y.
This must be done for the records to appear in user searches of the SPT database. This is done in the following way:
SPT will continue to update the screen to alert you of the progress of each rebuild. Rebuilds can only be done one at a time.