The video and sound engineers at the Cornell Lab of Ornithology's Macaulay Library -- billed as the world's largest archive of animal sounds and associated video -- are in the process of digitizing their entire collection. Within six months, they hope to post much of their world-renowned collection online for researchers, educators and others to use.
The engineers have had to invent standards for digital preservation where none existed, and they must retune their strategies as technology evolves every few years. For example, with the help of Sony and Canon, the library now has $150,000 high-definition video cameras to keep pace with industry standards and build a collection that stays relevant for years to come.The task is staggering. The library contains 170,000 sound recordings of 67 percent of the world's birds, and it has rapidly growing holdings of insects, fish, frogs and mammals, as well as some 28,000 video clips of 3,000 species. And the recording types and formats vary widely -- some audio clips date back to 1929.
Well aware that technology morphs constantly, lab engineers strive to create a digital record now that contains enough information to make it data-worthy for researchers and transferable to future systems that have yet to be invented.
"If we scrimp now, it won't be me and the people now who see it, it'll be the people 100 years from now who are trying to move everything to the next format, who start to get these pixilation errors or digital artifacts, or whatever, and they will curse my name, and that I don't want," said Marc Dantzker, curator for the Visual Media Collections at the Macaulay Library.
For video, the engineers need very high data rates -- speeds that data can transfer from one device to another -- and file formats that are not proprietary and whose codes are openly accessible and published. While the audio engineers have for the most part created their own standards, digitizing video is still an emerging technology, and Dantzker and his colleagues feel at times like they are groping in the dark.
To have high-resolution video recordings that match the quality of the original, the technology needs a few years to catch up. Hard drives that store information must become cheaper so that data can be stored uncompressed -- which takes up more disk space but ensures that information is not lost during a compression process. Until then, the library engineers have created an interim system that allows room to grow into new technologies.
"What we have preserved on disk is a very high-quality copy, but it is compressed, so inherently it is not as good as a high-resolution uncompressed copy," said Dantzker. Each recording has been time coded so the timing matches exactly with the original. When new technology arrives, the engineers will be able to maintain a clip's place in the archive by rerecording the master in that time slot, using new tools to store perfect (lossless) digital copies of video that retain all the information of the original.
"It's the state-of-the-art from archival through delivery," said Bob Grotke, the library's supervising engineer who masterminded the system with other library staff. For example, early on, Grotke and colleagues made a decision to transfer existing recordings to DVD-ROM with higher storage capacity when everyone else was using CD-ROM, and to record at 96 kiloHertz and 24 bits -- rates that were much higher than the standards of the time.Ever since the Macaulay Library started digitizing its massive audio collection in 1999, it has been inventing its own systems. As a result, the library has been at the forefront of developing current accepted standards.
Grotke explained that in an analog world sounds are continuous, while digital recordings take a 'snapshot' or sample of sound at a certain rate; 96 kiloHertz means a sample of the sound is taken 96,000 times per second; and 24 bits means there is higher resolution than, say, on a compact disc, which only uses 16 bits, because more bits add detail and accuracy to a sound sample -- similar to how a digital photograph's resolution would be increased based on dots per inch because the extra dots add detailed information to a picture. For some sounds, like the calls of bats or the echolocation calls of marine mammals, higher sampling rates are necessary to preserve all the information of the original recording, and one that will stand the test of time when new technologies emerge.
The DVDs are stored in 12 "jukeboxes," each about half the size of a refrigerator. Each holds 480 DVDs, all connected to the local area network, making individual high-resolution recordings available at workstations at the lab. Three and a half are in use, representing 4 terabytes (a terabyte is a trillion bytes -- enough space to store 300 feature length movies) of digitized audio, or about a third of the audio collection that currently has been digitized. The library's digital video collection already takes up roughly 2.5 terabytes of disk space. And that's only about 60 percent of the current video holdings. Once an uncompressed solution is found, the current video holdings will require roughly 100 terabytes to store perfect copies.
However, with the cost of hard disks falling rapidly and maintenance costs rising, the library is abandoning the jukebox system and has begun storing the entire collection directly on hard disks.
"The shift from analog to digital archival and distribution has revolutionized the Macaulay Library's operations at all levels: We are faster, vaster, more accessible and better indexed in the digital world than we could ever have become in the analog one," said Jack Bradbury, the Robert G. Engel Professor of Ornithology and director of the Macaulay Library. "The constant change in technology keeps us on our toes, but we have become quite talented at inventing what we cannot find off the shelf or forging partnerships where teamwork is the better solution."