Processing Finished Scans For Inclusion Into Online Catalog

THL Toolbox > Scanning & OCR > Processing Finished Scans For Inclusion into Online Catalog

Processing Finished Scans for Inclusion into Online Catalog

Contributor(s): Than Grove

The high-resolution TIFF page-scans are too large to be effectively used online. Therefore, they need to be converted into smaller web-usable images, in this case, Jpegs. While more advanced software suites such as external link: Image Magick and external link: Zoomify remain to be investigated, in the meantime the page-scan TIFFs are converted through Photoshop's batch-processing into medium-sized JPeg images. For the present time, we are making only a single readable screen-size, derrivative.

The basic qualities sought for the converted images are:

  1. They are readable (sharp enough and dark enough)
  2. They fit within larger screens (around 1200-1400 px wide)
  3. They are manageable in a web context (file size of less than 150kb per image ideally).

As page size, format, and color differ from printing to printing, each edition of each collection will have it's own specific guidelines for how to process its scans. These will be described on separate pages:

However, the basic principles of the conversion are consistent throughout.

Basic Guidelines For Converting

Nonetheless, the conversion processes for all types of page scans have some similarities. Any conversion process should involve the following in roughly this order:

  • Conversion of Mode to Grayscale (if images are black and white)
  • Rotating Image (if necessary)
  • Cropping of consistent extra white-space around page text
  • Contrast and other quality adjustments to make the text clearer.
  • Resizing the Image
  • Saving as JPeg

Conversion of Format (Only if Black and White)

If the TIFFs are black and white, their format is a bitmap mode. If these is directly saved as a JPeg, the result is a pixelated, jagged image. Prior to saving as a JPeg, the mode of the image needs to be changed to Grayscale. In Photoshop, this is done under the Image menu and the Mode submenu. Upon choosing to switch modes, Photoshop asks what ratio to use. Just click the OK button with the Ratio of "1".

Rotating Image (If Necessary)

For some machine scanners where the page is fed into the machine from the edge, the resulting images will be vertical. During the conversion process, they need to be rotated 90 degrees either clockwise or counterclockwise depending on how they were scanned. This is done under the Image menu and the Rotate Canvas submenu.

Note: I have found one instance where a whole volume was scanned with the odd and even pages oriented differently. That meant, the odd pages needed to be rotated clockwise, while the even pages needed to be rotated counterclockwise. If such a situation occurs, the odd and even images need to be sorted into separate folders and different batch processes need to be run on them (at least for Photoshop 7).

Cropping of Extra-White Space

If there is consistently a large amount of white-space around the border of the image, it should be cropped. Cropping white-space around the borders, allows the resulting text size to be larger because more of the screen width is filled with text. However, because the images are processed in a batch, one should be conservative as to the amount of white-space cropped. Page-scanning whether done by machine or hand results in some images that are slightly askew or aligned somewhat differently within the image "window". One should choose a cropped size that takes away white-space from each image without cropping any text. Looking at each image when there are large numbers is impractical. So, one should be conservative in the choice of a crop size. Cropping is accomplished in Photoshop by going to the image menu and changing the Canvas Size. One can adjust the horizontal and vertical size of the canvas. If either or both are made smaller, this does not resize the image but crops it. Photoshop will warn you of this.

Contrast and Other Adjustments

In some instances, xylograph pages are in part faded either due to wear or because the ink was running low on the block, or else the contrast between the ink and the color of the page is not great enough. It is possible to counteract some of these problems by adjusting the contrast and/or brightness. This is done under the Image menu and the Adjustments sub-menu and the Brightness/Contrast choice. Exact settings need to be experimented with on several sample images to arrive at an acceptable medium for the whole collection. Using the Preview option allows you to see exactly how the settings are affecting the image before you press OK.

Photoshop has many different settings some of which may be appropriate for different editions. These adjustments would be done at this stage prior to resizing the image smaller.

Resizing the Image

The images all need to be of a consistent size, the whole length of which is viewable on a large screen, and which is readable on all screens. In the tests done to date, an average width of 1200 to 1600 px seems the best for fitting on a screen. However, some pages with large borders require even larger image size, such as 2000 or 2200 px. In Photoshop the image size is changed through the Image Menu and the Image Size option. As long as the "Constrain Proportions" checkbox is checked, one can change only the width and the height will adjust itself correspondingly. Usually, the resulting height of page scans is between 200 and 250 px.

Saving as JPEG

The last step in the conversion is to save the image as a JPEG. This is done in the File Menu, through either the Save for Web option or the Save As option. In Photoshop 7, the Save for Web option does not work properly with batch conversion and one should use the Save As. This is probably fixed with later versions of Photoshop. In either case, when one saves a JPEG, Photoshop will ask for the Image Quality. This affects the clarity of the image as well as the size of the resulting file. One should shoot for as small a file size as possible without noticeably sacrificing clarity. Usually, a setting of 7 or 8 is best for JPEG image quality, but again this should be determined by testing it with several image examples.

This page is provided courtesy of the external link: Tibetan and Himalayan Library.