Wellyopolis

January 28, 2008

Amateur digitization for academics, updated

This is mostly an old post, which I've been trying to update, but the updates won't stick on the old post so I'm redoing it as a new post, with a new title. It's probably been the most popular post on this blog, useful to people around the world, and not just historians.

This query about buying a digital camera stimulated me to put finger to keyboard and jot down my collected wisdom about using a digital camera for your research. Some of what I say will pertain mostly to historians—that will be the references to the mysterious archives that conveys a lot to historians and perhaps diddly to others—but the basic idea of substituting digital photography for photocopying will have general applicability for a lot of people.

Getting my caveats up front, I should note that, like photocopying itself, photographing material you could just be reading and taking notes on and being done with, is one of those productive forms of procrastination that feel like work but don't get the real job—writing—done.

That aside, what I outline here really can save time and money over a period of a couple of years. Digital photography is a lot quicker than photocopying (time is money); you can file your documents more compactly, which can be worth a lot if you anticipate/are moving homes or offices; and if you name your files or folders well (and use shortcuts/aliases) you can file your materials more effectively. Some people may ask, what about scanners? Don't bother, is my opinion. Scanners take much longer to record their image, are potentially more damaging to the documents, and are larger and heavier making them far less convenient for traveling to archives. Not to mention, ever tried taking a family photo with a scanner?

The bottom line figures for historians to keep in mind is that if you are photographing quickly and not stopping to examine and select material you can photograph up to 400 pages an hour. A linear foot of archival material is approximately 2000 pages. Thus, allowing for distractions and breaks to prevent RSI etc ... you could photograph a linear foot of archival material in an eight hour day. Do your own calculation here on how long it would take you to work through this reading and taking notes. If you can photograph material I think it quickly becomes an economical option for a lot of research.

The cost-benefit calculation of photographing the documents and returning home, versus going to the archives and reading the material there will depend on your situation. Most importantly, the archive or library has to allow self-copying with a digital camera. This is becoming more common, but may depend on precisely what you are looking at a particular place. As always, contact the archivist before you go! Other variables to consider in deciding whether to hit the archives, photograph and return include;


  1. What is the cost (time/money) of spending time at the archives? The higher the cost of research trips the more you want to consider the short trip to photograph material. It might be less obvious that the slower you read, the more you should consider the "photograph and run" approach to archival visits.
  2. Other ways of thinking about archival time versus time with your photographed images are;

    • How intensely are you taking notes from something? If you're basically transcribing a page, well, photograping is a lot quicker than sitting in an archive far from home. Taking dictation from dead people, as it were. Though I do grant that typing direct quotations from your sources is an unparalleled way of internalizing the sources you're looking at. In short, if you are doing more than a couple of lines summary of every page you look at, consider photographing it for posterity and note taking later. If you're looking at making some sort of systematic database of whatever (probates, wills, laundry lists, surveys) don't do data entry in the archives if you can avoid it. Photograph it and take it home. This also allows you to double key some of your entries if you have the time and inclination to do so. And once your data is all entered you can verify any strange entries.
    • How accurate do your notes have to be? If you write "taht" for "that" there's minimal damage to your research. Indeed, what with modern standards that we shouldn't even sic basic errors like that, maybe none. But if you change a 39 year old to a 93 year old on a census schedule (for example) suddenly someone who was a wife and mother looks like perhaps she should be a great grandmother and mother in law in the same house. That's quite a change. In other words, the more accurate your notes have to be, or the easier it is to make errors while taking notes quickly, the more you want to photograph.
    • Do you know in advance what you're looking for? The less you know what it is you're looking for, the more it helps to photograph the documents for later persual, in case your initial note-taking focused on the "wrong" thing. Lists and tables and the like are prime candidates for photographing as they defy easy, accurate and quick summary in notes. If it's in a table it's already a summary so you often can't just take one or two figures from it, as you might summarize a page of an argument in a couple of sentences. If you've ever reproduced a table of figures from an archival source in your notes you'll know what I mean, it takes a while. You have to count the columns and rows, and then decide which way to read the table to enter the data accurately etc. Photograph it and take it home.

  3. Are you going want to follow up leads you find in material you've copied? How well have you identified beforehand what you are going to copy? The most productive "hit the archives and copy" trips are those where you know precisely what you want to copy before you go, and aren't likely to be needing to use other collections.
  4. How much do you need to look at? If you only have a small amount of material to work through the traditional approach to visiting the archives should work. The larger the collection, the more you probably want to copy.
  5. Are you looking at images or small text that is difficult to read? Being able to view an enlargement of your material can be really, really useful in some situations. With a photograph you are not limited to the 200% enlargement you could get on a photocopier.
  6. Are you going to use it again? The more you are going to re-use a particular page, the more you want to photograph it.
  7. Do you anticipate giving presentations about your research where you might want to illustrate what you are talking about? Being able to show a slide of the sources you are using can be very interesting for conference presentations, and especially when you have images. As best I can tell from talking to archivists displaying an image in a conference presentation does not constitute reproduction that requires permission since there is no permanent copy of the item being distributed. You should check this for yourself for 'your' collections, but digital photography can open up new possibilities for what you include in teaching and conference presentations.

If you have decided to hit the archives to photograph material, what follows is potted practical advice on how to go about it. It bears repeating, check with the archivist you can do this before you start ...

Camera: To reproduce archival material or modern printed books and journals a camera with a "document" mode is ideal. The Nikon Coolpix range has this feature. Personally, I have been using the Coolpix 5900 which (of course, one year later) has been superseded by the 5600 which you can pick up for $250-300. Apparently Sony also has cameras with this setting. I have been very pleased with the Nikon as it is small and lightweight, while still having a large LCD screen. The 5900 has a 5 megapixel default setting, which is just about ideal for document photography.

Flash and macro settings: The document mode mentioned above defaults to black and white images with no flash. Many archives want you to avoid flash to protect the sources. However, if you're photographing modern material (journals/books) you may choose to use a flash to get better contrast. Beware of glossy pages and make sure that if you are using flash it is not reflecting on the pages. Many older books have non-glossy text and then glossy photographs, so be sure to be aware of this if you are photographing books with the flash on. If you get a camera without a document mode, you want to be sure you can turn the flash off, set it to black and white, and use a close-up or macro setting. This will allow you to focus closely on the pages and get high quality reproductions of the documents.

Memory cards: If you are copying a lot of material you will want high capacity memory cards. On a 5 megapixel document setting, each image is about 950kb, depending on how complicated the image is. Just for comparison, a regular colour photo will be about 2/3 larger again. The image for a nearly blank piece of paper might be as small as 700kb, but if there's lots of text then it might be around 1mb. A 1GB card can hold up to 1300 document images. Your needs will vary, so this is only a guide.

Power source: A lightweight camera (like the Nikon Coolpix range) runs on rechargable lithium batteries which run out relatively quickly. If you are using the battery you'll be lucky to make 400 images before having to change the battery or stop (for several hours) to recharge it. The bottom line is that if you are going to be photographing a lot of pages in a short period of time, then you need at least two batteries so you can be charging one while you are using the other, or buy a power adapter for the camera. A power adapter is relatively cheap, and can be purchased separately from the camera. Unless you are going to urgently photograph a lot of documents in a short period of time (e.g; you are at an archive for one day and can't return easily if you don't finish) start with a couple of batteries, and purchase the power adapter if there's a demonstrated need. Of course, if you have a research grant you need to spend on equipment ...

Copy stand or tripod: Tripods are widely available and with a little fiddling can be set up in such a way that you get good images. However, if you are going to be doing a lot of photography of sources, consider buying a portable copy stand. You can get a good one for approximately $70 (or see here, at buy.com). Note that you will also need a piece of cardboard to lay over the legs of the copy stand to put your documents on so they lie flat under the camera. The huge advantage of a copy stand is that the documents lie flat under the camera. Many tripods can only be configured to photograph the documents at a slight angle, reducing readability and accurate reproduction. If you have a copy stand you can—if you make good copies—do your own reproductions for publication (though be sure to get permission to publish). Many archives charge $10 (at least) for photographic reproductions of material suitable for publication. You don't have to do this many times to exceed the cost of the copy stand. A copy stand is not something any one person will be using all the time, so you might consider seeing if your department could purchase one for loan to people who need one.

How the copy stand works
Since I first published this post, people have asked the most questions about the copy stand. Hopefully these pictures will illustrate it better. As you can see the camera is looking directly down upon the documents, which is difficult to achieve with a tripod, unless you have a tripod arm. The height of the copy stand is adjustable. With the Testrite CS-7 I've been using I can photograph A3 or legal paper by having the camera at the highest point.

Document photography with the copystand proceeds most rapidly with loose leaf paper. The procedure is simple. Put the paper on the stand, photograph, move the next piece of paper on, photograph ... repeat. Doing this it is straightforward to achieve 300-400 pages per hour, though this gets tiring.

Books are slower, since you sometimes have to hold the books open at a particular page. Although this means getting partial images of your hands beside the document text, it is quicker than using beaded book weights to hold each page down.

Source information: Make sure that you include information on the source in the image, so you know where the material came from. If you know ahead of time what collections you will be photographing material from you can print out reference information that you cut into strips to lay beside the documents when you photograph them. These strips of paper should include the collection and library and other information. You can leave space on the paper to add any document-specific information with pencil, erase it, and use the same paper for the next document.

Transferring images and organizing files: If you are concerned with making the most of your time in the archives, wait until the end of the day to transfer images from the camera to your computer. If you have multiple images it can take quite a while, as most cameras transfer data via USB which is not that fast.

Once you have the images on your computer, it really is up to you to organize as you see fit. Since hard disk and other computer failures are more frequent than house fires, whatever you do should include backing up your images at least once. This need not be too complicated or expensive. If you are at a university, you should have access to some form of network server storage provided by the university that is backed up regularly and reliably (onto tapes and stored offsite ideally). This should probably be your first option for a backup. Don't rely on CDs or DVDs for long-term storage unless you want to be spending your time rotating disks and checking that one set hasn't failed etc etc ... Network storage is the way to go as your house is unlikely to burn down at the same time as the university does. If it does you are probably living in an area with geothermal risks or hurricane activity. Or Chicago in 1871.

Software for organizing files As mentioned above I organize my photos into folders and don't worry about renaming them. I sort by the original name within the folder, so that I then browse them in the order they were taken. On Windows I found the best way to browse through photos like this is with Picasa. This is also now the case with the Mac.

It goes almost without saying that if you are going to be doing your research from digital images like this you need a double monitor.

Backing up is the most important thing everyone should do with their images. Beyond that my advice, for what it's worth, is that you find a way of organizing your files that does not take too much time, while still allowing you to find things quickly. You could spend a lot of time renaming all your files from the default digital camera name (DSCNxxxx.jpg, for example) or you could spend it doing something more productive. My approach, and I have more than 15,000 images for my research and this has worked well for me, particularly for documents from archival collections, is to group images into folders with usefully descriptive names. Sometimes a folder relates to just one document, and may only have a few images (pages) in there. Sometimes a folder will initially relate to a whole collection (e.g; all the photographs from a particular magazine over twenty years). When I examine the material in more depth I may create more folders. (Once documents are in folders, renaming them from DSCNxxxx.jpg to "something more meaningful xx.jpg" is relatively straightforward. If you're using OS X, see here. Also pretty quick on Unix. I can't speak as competently to what's possible in Windows)

When I am working with the images, principally what I am doing is reading and taking notes into Word documents. At the moment, for each of my five dissertation chapters I have between five and twenty Word files with my notes on variously defined sub-topics for the chapter. Basically, this is the old historians method of separate thematic note cards, but just done in Word so I can search it. I annotate my notes with both the original source citation and the name of the image file I have of the source. By having the original source citation right there, when I'm writing I can add in the footnote immediately without opening the image file again. But if I want to go back and re-examine the image of the source I can quickly find the name of the file too. This approach works well for loose leaf material from archives.

If you have photographed articles or whole books (old ones, of course, out of copyright) then the folders and original images approach can still be used, but making Acrobat files is even better. This allows you to have just one file for a whole article or book, which you can then organize by adding bookmarks for navigation, and using Acrobat's editing features to add your own comments and annotations. Acrobat can be had for $88 academic pricing. This is only worth the money if you have enough documents you'll be wanting to combine into one file to keep together.

OCR: One extension to this way of working that I am beginning to explore is the possibility of optical character recognition from photographs. If you have photographs of printed or typed sources then this may be something worth exploring to save re-typing information. My guess is that you would need to have a project where you need to re-type quite a lot of data to make this worthwhile. In my case, I have some printed tables that I want in a database. Because of the uniform layout of the material it should be possible to use OCR.

Adding it all up: To undertake your own personal digitization project you are looking at spending about $500-600 upfront.










Camera$300
1GB memory card $80
Copy stand $50
Extra battery $40
Optional to start with  
Power adapter $40
Acrobat $90
TOTAL 600

I have estimated these costs at somewhat above what you could end up paying so that the comparison with photocopying and spending time at the archives is conservative. Switching to digital photography costs money up front, but the savings in time and money over a period of a couple of years can be substantial. When you consider that most archives charge at least 10 cents per page for photocopying, and often more (50 cents is not uncommon) you are starting to break even between 2000 and 4000 pages copied, even without accounting for your time and travel expenses. Indeed, it's the time savings that can really make digital photography the economical option. If you can turn a two week research trip into a one week research trip, and save six nights at a mid-range hotel and meals on the road there's your $600 and more repaid just like that. One problem is that some funding sources for graduate students and faculty are rigid (backward or asinine, perhaps) in the categories of expenditure they allow. That is to say that travel and accommodation expenses will be paid without questions, but equipment purchases are not permissible. A reasoned statement of how equipment purchases will save money in the long run, and a willingness to make equipment available for colleagues can change minds.

Trivial practical hints: Spending all day photographing documents can be mind-numbingly dull. Bring your headphones and set iTunes to shuffle so that you have something else to think about. Repetitive strain injury is not impossible. Take a break every hour or so, even if you are blitzing through and photographing a box quickly. While CDs are not recommended for long-term storage they can be used for short-term backup while you're away from home. Then if your laptop dies you haven't lost all your work to date, just one day of work.

Other sources of useful information
Columbia: "Going digital in the archives"
Journal for Maritime Research: Historical research in the 'digital era'
George Mason's Electronic Researcher website
American Historical Association: Taking a Byte Out of the Archives: Making Technology Work for You

Notes: Edited on 1 June to add references to multiple file renaming tips.
update, 27 February 2007: This discussion at eh.net on the economic history mailing list is incredibly valuable. Note, in particular, the recommendation to go for ISO and image stabilization over megapixels as criteria for cameras that are good in the archives.

Posted by eroberts at January 28, 2008 8:23 AM