Importing Data Binto Spt

THL Toolbox > Reference > Bibliographies > Importing Data into SPT

Importing Data into SPT

Contributor(s): Than Grove, Bill McGrath, Steven Weinberger

When there is a bulk of bibliographic data to be entered into SPT, it can be imported en masse into SPT through the Administration tool. However, to do so, the data must be arranged in the proper order. THL has written a series of macros embedded in an Excel document to assist in the importation of bibliographic data into SPT. This page describes how that is done, assuming that the data has already been entered into the computer or downloaded from a larger database.

Note for THL staff: as of December 2009, the import function only works through the interface on dev (and possibly on staging but not sure).

Setting up the SPT Import.xls Document

Download the excel file from here (June 24, 2009).
Open “SPT Import.xls”. There will be two spread sheets: “Bibl Data” and “SPT Data”. The first is for the bibliographic data to be imported. The second contains just a single row with the headers for all the fields in a single THL SPT record. If the SPT Data spreadsheet appears to be out of date, one can export a record from SPT and use the field headers from it to replace the values in this row.
Import your data into the “Bibl Data” sheet through this process:
1. Select that sheet
2. Import your data by going to Data → Import External Data → Import Data and navigating to the tab-delimited text file that contains your data. Answer the importation questions it asks and tell Excel to begin entering your data in column $A$1. Your data will appear in the “Bibl Data” sheet.
3. You can rename this sheet whatever you wish, by right-clicking on its tab and choosing Rename.
4. Save the “SPT Import.xls” under a new name indicating the data being imported.
With the basic data document set up, you can now proceed to Creating a Data Map.

Creating a Data Map

The data map describes the correspondences between the fields in the original data (the column names in the “Bibl Data” or original data worksheet) and the fields in SPT (represented by the “SPT Data” worksheet). To do this, the columns in the original data worksheet must be labeled, that is, the first row of each column must contain the name of that field, as with the SPT Data sheet. Assuming that this is the case, one can use the macro to create a data map:

To initiate the Data Map routine, press Shift+Ctrl+M
A dialog box will pop up asking “Choose Data Worksheet”. From the drop down list, choose the name of the worksheet with your original data in it, and press “Enter”.
The macro will then create a third worksheet called “Data Map”, which will contain the correspondences between the original fields and the SPT fields

The data map sheet has three columns. The first column lists all the SPT fields. The second column has what the macro has chosen as the corresponding field in the original data based on similarity of field name. The third column contains the field (column) number of the original data field that corresponds to this SPT field. (The field number for the SPT field is the Row number minus 1.)

The routine will highlight fields that have been matched twice by adding a background color to the cell of both matches. Matching colored cells are the two fields that match the same SPT field. Also, after the list of fields, the macro will list unmatched SPT and original data fields. Unmatched SPT fields will have “------” in the second and third columns. Any conflicts or omissions should be resolved at this point, though not all the data in the original data necessarily needs to be imported, and every SPT data field does not necessarily need a value, except for the important one such as Title, Category, etc. A macro has been created to eliminate cell coloring by pressing Shift+Ctrl+z. Make sure that there is only a single match for every SPT field and that all the desired fields in the original data are accounted for.

Cleaning Up the Excel File

Author/Creator Column

When there is more than one author, these need to be separated by a semicolon with no space before or after it. Do not use commas to separate authors. Do not use "and" to separate authors.
Scan the Excel file for variant versions of the same author name. For example, if you find the following

Dr. Donald Lopez
Donald S. Lopez, Jr.
Donald S. Lopez, Ph.D.

Then you need to standardize the name and make a list of variant names to enter in the author controlled-name entry in SPT after importation.
1. Search SPT author controlled names to see if a version of the name is already in SPT. If it is, use that version of the name.
2. If no versions of the name are already in SPT, then search the Library of Congress authority list of names. Enter data in the Excel in the format First Name Last Name rather than the LOC format Last Name, First Name. For example, the authority name for Don Lopez is:
  Lopez, Donald S., 1952-
  Thus, for a work in the Excel document of which the author is Don Lopez,
  Donald S. Lopez, 1952-
  is the controlled name that would be assigned as author/creator even if the actual work has
  Donald S. Lopez, Jr.
  Note: when searching the Library of Congress authority files, be sure to select “Name Authority Headings” as the type of heading.
Make a list of all variant versions of the name.
Change all variant versions of the name in the Excel file to the authority name.
After importing the records into SPT, find the name in the author/creator controlled name database and add the variant names to the “Variant Name” field. If there is more than one variant name, separate them with a semicolon and space.
If you don't find the name either in SPT or in the Library of Congress authority file, decide which name to use, change all versions of the name to that version, and record the variant names as described above.

Converting the Data

Converting the original data into SPT format involves two step: rearranging the original data columns into the SPT order and then preparing the data for upload to the SPT database. Both these tasks are dealt with by a single macro that converts the data and exports it to a Microsoft Word document, saving it as a text only file. This is necessary because exporting data from Excel sometimes adds unnecessary quotation markings into string fields. To be sure that the Excel and Word interface is activeated, do the following:

Go to Tools → Macros → Visual Basic Editor (or, press Alt + F11)
From the VB editor window, choose Tools → References
Make sure that “Microsoft Word 11.0 Object Library” is checked
Click OK, and close the VB editor window

Then run the macro:

Press Shift+Ctrl+C
A dialog box will ask again for the name of the data sheet. Choose the sheet name and press Enter
A second dialog box will ask for the name of the data map sheet. Choose the sheet name and press Enter
A third dialog box will ask you to enter a name for the new reorganized sheet. Enter this and press Ok

The macro will then create the new sheet and copy the original data columns over to the new sheet in the SPT order. When this is done, it will ask you to verify the name of the “Classifications” column/field. It will then proceed to clean up the data for SPT import and convert diacritics to HTML entities. In the end, it will open a Microsoft Word document, copy the resulting data into it, and ask you if it can save the document, showing its name

Importing Data into SPT

Before importing this resulting document into SPT, check it in a text editor (I use jEdit) to make sure that it is complete. Make sure the text editor can show you line numbers. Each new line should begin with the name of a resource. If there are multiple lines with the name of the same resource, that's o.k. because that's how SPT records data records with multiple entries in a field. There can be 2 or more lines that are nearly identical. But, if there is an obvious break between lines so that the first field in the second line is not the title of a resource, this should be corrected. Also, if there is a blank line at the end of the document, this should be deleted. Although Tibetan unicode fonts appear as ??? in notepad, they can be added and saved using jEdit. If entering data using Tibetan fonts, make sure they appear in jEdit and they will import into SPT. The same is true for diacritics; they will appear as character entities in the tab-delineated document, but can be changed using the search-replace function in jEdit. Finally, due to the recent name change of our website, you will need to change the name of the resource "THDL Resource" to "THL Resource." The new system uses this new name and will not recognize "THDL Resource." This must be done after the Excel document has been converted into a tab-delineated text document, however, in order for the Excel macro to convert successfully.

Importing to SPT is done in the following way:

Login to SPT. Note: as of December 2009, the import function only works through the interface on dev (and possibly on staging but not sure).
Click on Administration
Choose Import Data
At the bottom of that window, use the Browse button to locate the file with the cleaned up data
In the drop down list above that, choose “Create Records OK for Viewing”
Press Import button and wait....

SPT will give you an error if the data is not properly formatted and it will tell you the line number where the problem is. One needs to correct this and go through the process of re-importation again. If the error occurs somewhere in the middle of importation, then when one starts the import again, SPT will ignore those records already imported and will not import them twice. One puzzling error that sometimes shows up is “Duplicate Control Names”. This refers to the case when there are two control names (e.g., the names of authors, translators, editor). Errors may also occur if the text document is not encoded correctly. Make sure that the encoding is UTF-8 (shown in the lower right-hand corner of jEdit) and not UTF-8Y.

Rebuilding Search and Recommender Databases

This must be done for the records to appear in user searches of the SPT database. This is done in the following way:

Go to SPT browse and login to SPT
Go to Administration
Choose Rebuild Search Database
Click Yes when it asks if you are sure. The rebuild can take on the order of 1 hour to complete.
Go to Administration
Choose Rebuild Recommender Database

SPT will continue to update the screen to alert you of the progress of each rebuild. Rebuilds can only be done one at a time.