Guest post by Michael Masterson
Thomas Smith is Co-founder and CEO of Gado Images, a San Francisco-based company that works with archives worldwide to help them digitize and monetize their visual history. Gado’s partner collections include Johns Hopkins University, the Afro American Newspapers, Silicon Valley Historical Association, Stuart Lutz Historic Documents, and many more.
Since it’s Black History month, your motivation for founding Gado in 2010 is particularly relevant.
My background is in cognitive science and cultural anthropology. Before founding Gado Images, I was working on an oral history project in East Baltimore, Maryland. We were gathering valuable interviews, but we couldn’t find any historical images to illustrate the neighborhood we were studying. At first, I thought the images simply didn’t exist. Then one day, I joined a researcher from Johns Hopkins University on a visit to the Afro American Newspapers. The Afro is the longest continuously operating, family-owned African-American newspaper in the world. Founded in the 1890s by a former slave, the paper has been around for over 120 years.
What I found at their Baltimore headquarters was an archive of 1.5 million photographs, including thousands of photos of the neighborhood I was studying. The Afro’s collection has been called among the best African-American history archives in the world, but the resources simply weren’t there to digitize it; when I visited, only about 5,000 images had been digitized in the paper’s entire history.
I co-founded Gado Images to help organizations like the Afro digitize and monetize their archives. We ultimately worked with the paper to scan about 120,000 images from their collection. More than 11,000 are now online and generating revenue for the paper. Since 2010, we’ve expanded tremendously. We now work with institutions, photographers and collectors worldwide, and provide a turnkey service for anyone who wants to digitize, annotate and monetize their archives.
Gado digitizes and creates metadata using the CMP (Cognitive Metadata Platform) for all different types of media. Tell us about the process.
The Cognitive Metadata Platform (CMP)™ is our proprietary platform for annotating our partners’ imagery. CMP uses neural networks, natural language processing, and facial recognition to automatically find and tag significant people, places, and objects in our images. The process begins when our partner archives submit new images. As soon as the images come in, the CMP begins by using facial recognition to check them for major personalities, pulling from a database of over 60,000 personalities, both historical and contemporary.
The system then uses Optical Character Recognition to pull any text out of the image. This can be crucial; at the Afro, for example, most images had typed or handwritten notes from the original photographer pasted on the back. The CMP can read these notes automatically. It can even pull text from street signs or other sources in the image itself.
Finally, the CMP uses object/landmark recognition to find important objects (like cars or buildings), specific landmarks (like the Statue of Liberty or Venice’s Bridge of Sighs), and brands (like a Coca Cola bottle or Wells Fargo logo) in each image.
Once all these inputs are in place, the CMP uses natural language processing to condense them into a list of marketplace-ready keywords. It can even automatically write a sentence-length caption for each image. Of course, the CMP can also take in metadata from human captioners; we have a professional captioning team who add additional metadata and research to many of our images.
The end results are images that are better tagged, more searchable and more valuable on commercial licensing marketplaces.
How do you handle distribution and monetization?
We work with 20+ marketplaces worldwide to distribute and monetize our partners’ content. These include partnerships with leading media organizations including Getty Images, Alamy, and Universal Images Group, as well as niche marketplaces like Sheet Music Plus. Once our partners’ content is online, we actively promote it to image buyers and photo editors.
We pride ourselves on our free research capabilities and we can generally turn around research requests from image buyers within 24 hours. If we don’t have a particular image in our collection, we often actually acquire an original print of the image or artifact, digitize it at our lab here in San Francisco, and have imagery available to the buyer in 7-10 days. We work with all kinds of formats, from early glass plate negatives to mid-century ephemera to 8mm Kodachrome films. This means that we always have new, unique materials available for creatives worldwide.
What sort of unique content do you have in your partner collections?
We have strong coverage of African-American history, from pre-history through slavery, the American Civil War, turn-of-the-century African-American life, Civil Rights, 20th century African-American entertainers, and even contemporary images of movements like Black Lives Matter.
We also have strong coverage of medical/scientific topics, including thousands of electron micrographs, public health materials, and unique medical imagery from the Special Collections of the Johns Hopkins University. Other topic areas include the Vietnam War, 20th century Americana, California history, and military history.
Finally, how are you working with universities and other institutions through your Digital Humanities Consulting?
Through our Digital Humanities practice, we work with large organizations to develop digitization and monetization programs for their collections. We do everything from evaluating collections for their commercial potential, to recommending equipment, to developing training materials for organizations’ own staff, to placing our staff members with partner organizations to help kickstart their digitization efforts.
Our Digital Humanities services make us a true turnkey operation; even if a partner′s collections are entirely unprocessed, we can work with them to transform their materials into a modern, digital, fully-annotated archive which is used around the world and generates revenue to support their organization’s mission.