Scanning Journals

THL Toolbox > Miscellaneous Technical Processes > Scanning Journals

Scanning Journals and Creating PDFs

Contributor(s): Ben Deitle, Steven Weinberger.

There are several methods to scan journals for the creation of PDFs. Which one you will use depends on the physical characteristics of the journal, and the desired quality of the finished PDF. There are three types of scanners in the E-text and Geostat center in Alderman library: the Epson XL10000, the Twain blah, blah, and the Epson Perfection 4870.

Brief directions for auto-feed scanning using Fujitsu Limited Twain (ScanPartner 15C)

The most time-efficient option for scanning is to use the Fujitsu ScanPartner because of its auto-feed feature. However, this can only be used if you have a loose-leaf document, or you are able to cut the binding off the journal. If so, proceed in the following way:

  1. Open Adobe Acrobat Professional.
  2. Select "Create PDF," and inside this window choose the desired scanner (Fujitsu Limited TWAIN Driver), the original document's format (Double-sided or Single-sided), and select adapt compression page to content for "Adobe Acrobat 6.0 or later". Before clicking "OK" make sure the document, or at least the first page, is loaded into the auto-feed tray face down with top of page loading first.
  3. In the scan configuration window, most of the default settings are fine. Change resolution to "300 x 300" dpi. You may also want to change the brightness setting depending on the darkness of the text in your document.
  4. Click "scan and begin scanning. It may be necessary to place pages one by one into the auto-feed tray to avoid mis-feeds if working with thin or low quality paper.
  5. If scanning a double-sided document, when the front sides finish scanning, flip the document over and place in the autofeed tray. In the pop-up window, check the "Back side of page xx". Click "Next" and the backsides will scan.
  6. When the scan is complete your document will appear in a new window. Scroll through the document to make sure the pages are all there and in the correct order.
  7. Crop pages if necessary by opening the crop tool from the tool menu, or right clicking a page thumbnail in the Pages sidebar.
  8. Add metadata in the Document Properties (Control+D). Under Description enter "full issue" in the title field and in the subject field enter the name of the journal, volume, number, and date. For example: Kailash, Volume 7, Number 2, June-Oct 1983. Click "OK"
  9. Save the file according to file naming conventions.

Detailed Directions for Auto-feed Scanning Using Fujitsu Limited Twain (ScanPartner 15C)

  1. Open Adobe Acrobat Professional
  2. Select "Create PDF." This can be done from the taskbar, or from the file menu. Within "Create PDF" select "From Scanner".
  3. At this point a new window opens with some configuration choices.
  4. First, choose the scanner from the drop-down menu. In this case "Fujitsu Limited TWAIN Driver."
  5. Second, choose single-sided or double-sided depending on the format of your original document.
  6. Under the "Options" menu make sure the "Adapt compression to page content" box is checked and in the compatibility drop-down menu select "Acrobat 6.0 and later". (If the e-text center at some point purchases a newer version of Acrobat with higher compression, you'll probably want to select that accordingly.) You can also move the compression/quality bar depending on your quality needs. For most journals, this can go all the way to the Higher Compression side.
  7. Place the document to be scanned into the auto-feed tray with the top of the page going into the auto-feed first and the side to be scanned down. The scanner should begin a preliminary load of the first page in preparation for a scan. (It is important to do this before hitting the "scan" button" as the program will otherwise assuming you are doing a flatbed scan. Although this can be corrected in the next window, the auto-format for a flat-bed scan that comes up can be a hassle to change.)
  8. Hit the "Scan" button in the window.
  9. Another window now appears with a variety of configuration choices appears. If you did not load your document into the tray before now, it will be preset to scan with the flat-bed. You can change this under the "Scan type:" prompt by choosing "ADF" (auto document feed) from the drop down menu. You can also change the resolution to "300 x 300" dpi. Generally, most of the other fields can be left as they are, though depending on the darkness of the printing, you may want to take the "brightness" level down to 104 or 96 (darker). (You can also set the page size here, but you'll have to reset it if your scan job is interrupted for any reason. I find it is easier to just crop all the pages to the desired sized after the scanning is complete.)
  10. Hit the scan button, the scanning commences!
  11. If scanning a double-sided document, when the front sides have all been scanned a prompt window will appear. Check the "Back of sheet xx" circle. Load the document into the tray starting with the back side of the last page. (The scanner will scan these last page to first page and then collate them with the fronts. So you only need to turn over the stack of sheets and load it back into the tray, making sure the last page is at the bottom and the top of the page is loading first. Actually very simple.) After loading the document, hit the "Next" button. The back sides of pages will then scan.
  12. When scanning is finished, your new PDF document will appear. At this point, it is often a good idea to scroll down through the document to make sure all the pages are there in correct order.
  13. If anything needs to be rescanned (for example, a page comes out black) it can be done now by creating a PDF the same as before, but check the "Append To Current Document" circle in the "Destinaiton" menu of the initial window.
  14. Once all scanning is complete, open the "Pages" side tab of your PDF, to get a string of thumbnails of your pages. Select any pages that were rescanned (they should all be at the end of the document) by clicking on them and drag them up to their proper place. Right click on pages to be deleted and select "Delete Pages" from the menu (make sure you have only selected the pages you want deleted).
  15. If you want to crop your document to a certain size, right click on any page and select "Crop Pages." To crop the entire document, under the Page Range menu check the "All" circle. Change the margins to the desired cropped size and hit "OK." If you have cropped too much, you can press Control+Z and undo the crop.
  16. Save your document (see file naming convetions). This will be the whole issue file for the issue. I usually save my document by selecting "Reduce File Size" under the file menu. It takes about the same amount of time as a usual save, and it ensures you document is sufficiently compressed (just remember to select "Acrobat 6.0 and later" from the drop-down menu of the initial Reduce File Size window.

NOTE: Poor quality paper, or very thin paper (such as that commonly used for journals produced in Asia) may not always correctly auto-feed. With these types of papers, if loaded into the tray all at once, the auto-feed has a tendency to take more than one page at a time. To avoid misfeeds, it may be necessary to place the pages one by one or a few at a time into the auto-feed tray. Misfeeds can ruin the scan job (because the pages will coallate incorrectly, which is not easily fixed).

Separating a Journal Issue into Individual Article Files

Once a full issue has been scanned, it needs to be broken down into smaller files containing front matter, articles, back, and any other sections. While working on these steps, be sure to keep your whole issue file intact.

  1. Open the whole issue file you created from scanning.
  2. Click on the "Pages" side tab of your document's window. The Pages sidebar makes it easy to select the pages of the various sections.
  3. Click on the very first page (or cover as the case may be) in the sidebar. This will select that page and mark it as such with a blue highlight ring around the thumbnail.
  4. Scroll down, still in the Pages sidebar, to where the front matter ends. This may include things such as the cover, title page, editorial data, contents, list of illustrations or plates, notes about contributors, and preface or forward. It is generally everything up to the first page of the first article.
  5. Hold down the Shift key and click on the last thumbnail page of this section. This will select and highlight all the pages in the section.
  6. Right click on one of the highlighted pages. A menu of tools will pop up. Select "Extract Pages" from the menu. Another window will open verifying the pages to be extracted. Click "OK." A new window will open with the extracted pages.
  7. Open the Pages sidebar in this window, and scroll through to make sure all your pages are there. At this point, you can delete any blank pages in the section (I have used the convention of leaving blank pages in the whole document file, but deleting them from the separated files). Just select the blank page, or pages, right click on one, and select "Delete Pages." A window will appear confirming your deletion. Click "OK."
  8. Now add metadata to this document. You can select "Document Properties" from the File menu, or just press Control+D, this will bring up the Document Properties window. Select "Description" from the left sidebar. Then fill out the fields for Title, Author, and Subject. The Subject field is used for the jounal title, volume and number of issue, and date. For articles also include page numbers. For example: Bulletin of Tibetology, Volume 3, Number 2, June 1966, pp 8-19. In the Title field enter the title of the article or a description of the section, like "front matter" or "full issue" (for whole issue files). In the Author filed enter the author(s) of the article first name first then last name with multiple authors separated by comma or "and". For example: John Henry and Polly Ann Henry. Or, James Madison (trans.).
  9. When finished adding Document Properties, click "OK."
  10. Now save your file using correct file name standards.
  11. Go on to the next section, and repeat the process. Select the first page of the section by clicking on the thumbnail of that page. Then scroll down to the last page of the section and click on it while holding down the Shift key. Right click on a page and select "Extract Pages." Add the necessary metadata to the Document Properties (Control+D) and then save the file with the correct file name.

Tip: If you fill out the Subject field within Document Properties for the whole issue first, then whenever you extract pages from it, this field will already be filled out in the extracted pages file and you only need to add the relevant page numbers to the Subject field. Another Tip: I find it helpful to leave the front matter file open and put it down in the corner of the screen with the table of contents page showing as I separate the rest of the issue. This is a nice little reference to guide you as you extract articles from the full issue.

Optimize the PDF

Optimizing the PDF in most cases will improve the quality and readability of the scan.

  1. Save the PDF with a different name, by adding "-opt" before the .pdf
  2. Pull down the Document menu and select Optimize
  3. After it finishes optimizing, check the quality against the original PDF and use whichever is better.

File Naming Conventions

Files should be given short descriptive names in the following format:

  • JournalName_VolumeNumber_IssueNumber_ArticleNumber(or a descriptive word)

Use underscores between information. If a journal has a long title, sometimes it helps to abbrieviate it. For example, you have scanned volume 3, number 2 of the Bulletin of Tibetology, which has a cover and contents section, several artcles, a notes and topics section, a book review, and then the back material. You would name these sections as follows:

  • bot_03_02_front
  • bot_03_02_01
  • bot_03_02_02
  • bot_03_02_03
  • bot_03_02_notes
  • bot_03_02_reviews
  • bot_03_02_back

Sometimes a journal, like the Journal of the Tibet Society, only has volumes, so then just put the volume numer.

  • jts_02_front
  • jts_02_01
  • jts_02_02

Sometimes a journal uses the year or issue number like a volume number and then has numbers for each year. For these put the year intead of the volume number, and then the issue number:

  • JournalName_Year_IssueNumber_ArticleNumber, or
  • JournalName_Number_ArticleNumber

If an issue spans more than one voume or number, use a hyphen. For example, an issue of Ancient Nepal is designated as numbers 53-56, so name the files:

  • ancient_nepal_53-56_front
  • ancient_nepal_53-56_01
  • ancient_nepal_53-56_02, and so forth

Finally, remember that the scanner is your friend, even when it crumples your document and jams.

Provided for unrestricted use by the external link: Tibetan and Himalayan Library