Guidelines for Creating Digital Images
We recognize that all collections differ in the ways they are used and accessed and that institutions have differing purposes and clientele, which will likely have an impact on how and for what purposes and reasons collections are digitized. Decisions on image quality and resolution should be based on the needs of users, how the images will be used and the nature of the materials you are scanning (dimensions, color, tonal range, format, material type, etc.). The quality and condition of the original (such as the quality of the shooting or processing technique in the case of photographs) impacts on the resolution at which you capture and the resulting quality of the digital image. These are not hard and fast recommendations for every collection and every institution. As a rule, the key to quality imaging is not to capture at the highest resolution possible, but to scan at a level that matches the informational content of the original.
The following guidelines contain the recommended settings for the creation of master, access and thumbnail images. Be aware that the combination of an increased bit depth and resolution will result in large to extremely large image files. These files may be difficult and/or costly to manage. Because of storage concerns and time considerations, it may be necessary to reduce the recommended resolution and/or bit-depth when creating master images. Carefully consider institutional needs and capabilities, however, before departing from recommended best practices.
Institutions choosing to capture images at the recommended bit depth will need the CS (creative suite) versions if using Photoshop. Older Photoshop versions do not offer the full suite of editing tools for images captured at 16 bit or greater.
Master Files
Good digital imaging projects scan a high-quality master or archival image and then derive multiple versions in smaller sizes or alternative formats for a variety of uses. There are compelling preservation, access and economic reasons for creating an archival-quality digital master image: it provides an information-rich, research-quality surrogate. A high-quality master image will make the investment in the image capture process worthwhile. Since user expectations and technologies change over time, a digital master must be available and rich enough to accommodate future needs and applications. The master image should be the highest quality you can afford; it should not be edited or processed for any specific output; and it should be uncompressed. Intensive quality control should be applied in creating master image files. Any errors made while creating the master file will necessitate going back to the scanner or camera to capture another image.
Master digital images should be stored in a file format that supports the fidelity and long-term preservation of the image. The master image file format should be:
- Non-proprietary / open source
- Uncompressed
- Technical metadata captured, as part of file structure
- Target included, when applicable
- Color management applied, when necessary
The recommended format most frequently used for master digital images is the Tagged Image File Format (TIFF). Another format currently being considered for master digital images is the Joint Photographic Experts Group File Interchange Format (JPEG2000).
Service Master Files
The service master is an optimized working copy of the master file that can be used as a source for all subsequent derivatives. They are also used to create print publications. Like master files, service masters should also be stored in a non-proprietary, uncompressed file format that supports the fidelity and long-term preservation of the image. The service master is generally optimized in the following ways:
- Rotated, if necessary
- Cropped, if necessary
- Tone levels optimized
- Colors balanced, if necessary
- Sharpened using “unsharp mask,” depending on the goals of the project
- Target cropped, when applicable
Derivative Files
Derivative files are created from the service master and are used for general Internet or network access. Derivative files typically include an access image, which is sized to fit within the screen of an average monitor or other delivery mechanism and a thumbnail image, which is small enough to load quickly and linked to the larger access image. With the proper image editing software, it is not necessary to subject source materials to multiple scans.
File formats using lossy compression are commonly used when creating derivative files. Derivatives are also generally optimized for computer monitor viewing so that visual details may be viewed as clearly as possible.
File Naming Conventions
Systematic file naming is important for system compatibility, interoperability and to demonstrate ownership of the digital asset. It is critical that your file names are unique, and it is recommended that they follow an established convention to assure consistency and ease of use. File naming recommendations include:
- Use lowercase letters of the Latin alphabet and the numerals 0 through 9.
- Avoid punctuation marks other than underscores and hyphens.
- Begin each file name with a two- to three-character acronym representing the institutional name followed by a second two- to three-character acronym representing the department or unit name (when applicable).
- Follow the institutional and departmental acronyms with an object ID. The object ID consists of any unique numbering scheme already in use to represent the object or, if no such number exists, a short description representing the item.
- File names should be limited to 31 characters, including the three character file extension.
- If burning to CD-ROM, file names should be limited to 11 characters, including the three character file extension, in case a recipient’s computer does not support long filenames.
- Use a single period as a separator between the file name and the three letter extension.
- Include a part designator after the object ID, when applicable.
When selecting a file naming convention, think long-term. Select a system that will outlast staff involved in the current project. Consider the number of files your institution will ultimately be managing. Remember human error ― if technicians will manually be assigning file names, how simple or easy will it be to make a mistake? Remember ― file names do not take the place of metadata. Keep them simple and straightforward.
Watermarks
What is a watermark? A watermark is information stored in or on a digital image, which allows image creators to store copyright or branding information to images, audio and video files and documents. Watermarks are applied to images in hopes of reducing misuse or unauthorized distribution of images.
There are two types of watermarks in use for digital image files, visible and electronic:
Visible watermarks are applied on top of the image, very much like a seal is applied to an official document. Often these watermarks consist of the name of the institution who owns the file, that institution’s official seal or some other identifying logo. In all cases, visible watermarks cover a portion of the image file. Visible watermarks do not stop users from downloading files, and they can be removed, depending on the complexity, size and color value of the design. The biggest drawback to visible watermarks is the obstruction to parts of the image, making the use of that image file less appealing to some researchers.
Electronic watermarks are imbedded in the image file, and they are invisible. They usually use a numeric code licensed by an electronic watermarking firm. The numeric code is specific to the institution that owns the files. Electronic watermarks are usually applied as part of the filter mechanism in programs like Adobe Photoshop. In some cases, upon very close inspection, the file will appear grainy after an electronic watermark is applied. Electronic watermarks do not stop users from downloading files, and they can be easily overcome through manipulation.
The cost of watermarking varies. Visible watermarking is virtually free, but invisible watermarking can be costly. If you are debating whether or not to watermark, consider controlling the use of your images by limiting the quality and size of publicly accessible files. Be sure metadata concerning ownership and copyright information travels with your images, either in embedded file information or associated metadata records.
An advantage to using watermarks, visible or electronic, is that they can assist in controlling the use and distribution of images. A disadvantage is that, when embedded they can potentially degrade the image, and when layered on top will most certainly obscure image content. Watermarks may suggest the institution’s intent to protect its’ collections, but they do not prevent theft or misuse.
Guidelines for Source Type
Text
When scanning text documents, spatial resolutions should be based on the size of text found in the document and resolutions should be adjusted accordingly. Documents with smaller printed text may require higher resolutions and bit depths than documents that use large typefaces. Projects that will have Optical Character Recognition (OCR) applied, may wish to test pages at several resolutions to find the most satisfactory results. Images that produce the best results for OCR may not be pleasing to the eye and may require separate scans for OCR and human display.
Projects with large amounts of textual materials, particularly hard-to-read materials such as manuscripts, should provide transcriptions of the materials in addition to the digital image. Access to textual material can be further enhanced through SGML/XML markup schemes such as the Text Encoding Initiative (TEI). As rekeying text can be cost prohibitive, projects considering transcriptions should investigate including Optical Character Recognition (OCR) software in their toolkit.
Text |
|
|
|
|
Master |
Access |
Thumbnail |
File Format |
TIFF |
JPEG |
JPEG |
Bit Depth |
1 bit bitonal |
8 bit grayscale |
8 bit grayscale |
Spatial Resolution |
adjust scan resolution to produce a minimum pixel measurement across the long dimension of 6,000 lines for 1 bit files and 4,000 lines for 8 – 16 bit files |
150 – 200 ppi |
144 ppi |
Spatial Dimensions |
4000 – 6000 pixels across the long dimension |
600 pixels across the long dimension |
150 – 200 pixels across the long dimension |
Photos
Photographs can present many digitization challenges. We recommend digitizing from the negative (or the earliest generation of the photograph) to yield a higher-quality image. However, in the case of photographs developed according to artist specifications, the photograph itself should be digitized rather than the negative.
When considering whether to capture sepia-tone photographs in color or black and white, we recommend digitizing them as color images to create a more accurate image. Digitize the backs of photographs as separate image files if there is significant information on the back of the photo (which may be of interest to users) that may not be included elsewhere. If an image of the verso of the photograph is available, the digital image will serve as a more successful surrogate for the original.
Photographs |
|
|
|
|
Master |
Access |
Thumbnail |
File Format |
TIFF |
JPEG |
JPEG |
Bit Depth |
16 bit grayscale |
8 bit grayscale |
8 bit grayscale ) |
Spatial Resolution |
400 – 800 ppi |
150 – 200 ppi |
144 ppi |
Spatial Dimensions |
4000 – 8000 pixels across the long dimension, depending on size of original, excluding mounts and borders |
600 pixels across the long dimension |
150 – 200 pixels across the long dimension |
Graphics
Graphics include the various techniques used to reproduce words and images from originals such as engraving, lithography, line art, graphs, diagrams, illustrations, technical drawings and other visual representations. Nearly all graphics will be two dimensional and should be scanned using the following guidelines.
Graphic Materials |
|
|
|
|
Master |
Access |
Thumbnail |
File Format |
TIFF |
JPEG |
JPEG |
Bit Depth |
16 bit grayscale |
8 bit grayscale |
8 bit grayscale |
Spatial Resolution |
600 – 800 ppi |
150 – 200 ppi |
144 ppi |
Spatial Dimensions |
6000 – 8000 pixels across the long dimension, excluding mounts and borders |
600 pixels across the long dimension |
150 – 200 pixels across the long dimension |
Artwork / 3-Dimensional Objects
For projects where the physical dimensions of the non-3D artwork matches the equipment available, the following standards can be used. If scanning photographic copies of objects and artifacts, see recommended requirements in the appropriate photo and film charts above.
Artwork |
|
|
|
|
Master |
Access |
Thumbnail |
File Format |
TIFF |
JPEG |
JPEG |
Bit Depth |
48 bit color |
24 bit color |
24 bit color |
Spatial Resolution |
Device Maximum |
300 ppi |
144 ppi |
Spatial Dimensions |
100% of original |
600 pixels across the long dimension |
150 – 200 pixels across the long dimension |
Maps
Scanning maps may involve items that vary widely in size, condition and amount of detail. Small maps may fit easily onto a flatbed scanner, while large plat maps may need to be scanned in sections using a large format scanner or captured by a camera. The size of the image can become a problem for storage, but also for viewing, serving over the web or processing.
Smaller maps (less than 36 inches on the longest dimension) should be digitized at 600 ppi, 48-bit color or 16-bit grayscale if possible. For larger maps, 300-400 ppi may be more practical. If it becomes necessary to digitize a map in sections and stitch the image together in Photoshop, keep both the original images of the sections as well as the combined image.
Maps |
|
|
|
|
Master |
Web |
Thumbnail |
File Format |
TIFF |
JPEG |
JPEG |
Bit Depth |
16 bit grayscale |
8 bit grayscale |
8 bit grayscale |
Spatial Resolution |
600 ppi |
150 – 200 ppi |
144 ppi |
Spatial Dimensions |
6000 – 8000 pixels across the long dimension |
1078 pixels across the long dimension |
150 – 200 pixels across the long dimension |
Film
For duplicates (negatives, slides, transparencies), match the original size. However, if original size is not known, the following recommendations are supplied: For a copy negative or transparency, scan at a resolution to achieve 4000 pixels across the long dimension. For duplicates, follow the scanning recommendations for the size that matches the actual physical dimensions of the duplicate.
Master scans of camera originals may be captured and saved in RGB, particularly those negatives that contain color information as a result of staining, degradation or intentional color casts. Derivative files could later be reduced to grayscale in the scanning software or during post-processing editing.
Film |
|
|
|
|
Master |
Access |
Thumbnail |
File Format |
TIFF |
JPEG |
JPEG |
Bit Depth |
16 bit grayscale |
8 bit grayscale |
8 bit grayscale |
Spatial Resolution |
Resolution to be calculated from actual image format and/or dimensions - approx. 2800 ppi for 35mm originals, ranging to approx. 600 ppi for 8”x10” originals |
150-200 ppi |
144 ppi |
Spatial Dimensions |
4000 – 8000 pixels across long dimension of image area, depending on size of original and excluding mounts and borders |
600 pixels across the long dimension |
150 – 200 pixels across the long dimension |
