The realities of digitization
When we were discussing the issues of copyright, open access and digitization of collections in class, someone made a comment that essentially said “why don’t museums and libraries and archives just get on with it and digitize their collections?” Making collections accessible through digitization is more complicated than scanning and uploading images or documents and with this post I wanted to provide a real world look at the process.
First off, I would like to thank Special Collections librarian Susan Graham and Special Collections archivist Lindsay Loeper for giving me such a great interview; student interns Megan and Nitish for answering additional questions and allowing me to pictures while they worked on digitizing photographs; and Special Collections chief curator for approving my request to take photographs in Special Collections.
I made two visits to Special Collections at UMBC’s Albin O. Kuhn library. At the first visit, I conducted an interview with Susan Graham and Lindsay Loeper. On my second visit, I took pictures and asked some additional questions of Susan and the student interns.
The following post is a compilation of the interviews with some irrelevant material edited out for the sake of space.
The Digitization process
Susan and Lindsay explained that Special Collections owns two 12”x17” flatbed scanners that are used to scan documents, photographs and transparencies such as slides and negatives. A special book scanner is used for books and other bound materials. Larger items that won’t fit on any of the scanner can be photographed with their large scale camera setup.
Most scanning of photographs takes place in-house and is usually done by interns. Each photograph takes about 1-2 minutes to scan.
The scan is checked for quality, cropped if necessary and basic metadata is added, again by the student interns.
The descriptive metadata is added by either Susan or Lindsay, sometimes by staff from Bibliographical and Metadata Services (this is a department of the library – the link describes their function) and occasionally by grad students. Quality of the scan and metadata are checked, the collection created and then uploaded to the asset management system, ContentDM.
Documents can also be scanned on the flatbed scanners. Bound materials are scanned on the book scanner.
Images from documents and books are put together in a PDF which is then run through OCR (optical character recognition) software which makes the text searchable. A final check is done for image quality and text accuracy, the metadata is added and then item is then uploaded.
Books and documents require higher definition images which take more time, about 5-15 minutes each, plus extra time for lighting, and adjusting and cropping of the images.
How items are chosen for digitization
Susan said that the chief curator decides the priorities keeping in mind the value of the item; if the content is important; whether is supports the mission of the library; the item’s uniqueness; it’s condition; whether the item has already been digitized by another institution; if the library owns the proper equipment to scan or photograph the item; and if the item supplements another collection spurring collaboration and making more resources available to researchers. The potential use is also considered – whether it supports the teaching needs of UMBC. Lindsay added that the emphasis is on university owned items where UMBC owns the intellectual property rights, permission can be obtained or the copyright has expired and the contents are in the public domain. If these rights aren’t owned or can’t be obtained, then the item can’t be made available through digitization.
An excellent example of copyright issues is the recent acquisition of the research files and manuscript from former UMBC professor Joseph L. Arnold who died before his latest book could be published. He had assembled 35 cartons of research papers on the history Baltimore which are organized by subject. Special collections is collaborating with his widow to put his manuscript online sometime in 2015. However, because his research papers consist mostly of newspaper clippings and copies of journal articles and such, they can never be made available because of copyright restrictions.
The collections are open to the public and anyone from anywhere can make a request for copies of items for either personal use or publication. Individual items often get added to the digital collections because of research requests. Special collections will scan the items up to their digital standards and add them to their digital collections – that’s how a lot of the content was added to the Selected Images from Various Collections. These items might not have otherwise been made available online.
Susan and Lindsay said that use determines priority for items to be digitized. The library uses Google analytics to track usage and Special Collections tracking fits into the library’s tracking as well. So collections that are accessed most often are given priority but still have to go through the same decision making criteria as anything else. (The Lewis Hines Collection and the Hughes Company Glass Negative Collection are probably the two most accessed collections according to Susan.)
How digitization has been funded
Susan says that she’s unsure of how digitization was funded before she came to Special Collections but since then they have used an assortment of methods. To my surprise, they haven’t used grants for any specific projects but I was impressed by some of the creative methods they have used.
Lindsay worked with the Digital Maryland program to have two collections digitized, described and made available through the Digital Maryland interface at no cost to the library. Digital Maryland is a statewide program that works only with Maryland based materials. Lindsay said it was a great fit for them and hopes to work with them in the future.
A good portion of what is available in the digital collections is UMBC materials, such as the school newsletter, The Retriever Weekly, and commencement and theatre department programs. Some student staff members of from The Retriever Weekly have internships where they come in to help digitize past copies of the newsletter. Another time, the university funded the digitization of commencement programs (by supplying the students to work on the project) as part of their outreach programs.
Special Collections is also the repository for groups like the American Association of Immunologists and the International Union of Immunological Societies, among others, who provide funding for digitization of their collections.
My favorite story was about the Hugh Bolton Jones letters. A woman was doing genealogy research and discovered that the ancestor she was researching was mentioned in the Bolton-Jones letters. She contacted Special Collections to get paper copies of the letters; instead they negotiated a reduced price and she paid for digitization of the whole collection of letters. She was happy to get good digital copies and the whole collection was made available and accessible. I think this was a great solution that made everyone happy.
Special Collections just acquired a large Maryland folk life collection that includes objects, recordings, photographs and documents and this may be an instance where they would apply for a grant in order to digitize the entire collection at once.
Summing it all up
I hope this quick look at the process and issues behind digitization answers some questions as to why more items aren’t digitized and made available online.
Intellectual property rights are one of the main considerations in determining whether to digitize an item and make it available online.
Having the necessary resources is the other main consideration. Aside from doing the physical scanning and adding metadata, there is the consideration of storage. Lindsay made the point that preservation quality storage of digital assets costs more. Susan added that the more items they have digitized the more it costs for their digital asset management service. These are two costs that have to be addressed in their budget every year that most people wouldn’t think about.
Just one more thing – the sheer number of objects can make digitization impossible. Special collections owns over 2 million (yes, that says million!) photographs dating from the beginnings of photography up to contemporary photographs. If it takes 2 minutes per photograph just for scanning, it would take ….well, you do the math! (I think it would be over 7 years, 24 hours a days, 365 days a year, just to scan).