In our recent live chat with Ted Han and Jaimi Dowdell, data editor Tom Meagher and I learned a bunch we didn’t know about DocumentCloud, a web app that makes it easy for journalists to analyze, annotate and publish documents.
If you’re thinking of it as a competitor to Scribd, stop now – I made the same mistake. DocumentCloud differs in a few fundamental ways, in large part because it caters specifically to journalists, thanks to stewardship by Investigative Reporters & Editors and early funding by the Knight Foundation. Intended from inception to serve journalists, DocumentCloud has a number of “killer” features for reporters and editors dealing with document collections.
With this in mind, we put together a list of five ways DocumentCloud can be used to support journalism in a local newsroom. Find those below.
Five Ways DocumentCloud Can Support Journalism
- Maintain private notes on a document. In a newsroom years ago, you might be able to tell how info-dense a document was by the number of Post-It Notes attached. DocumentCloud lets you do the same thing, without the risk of paper cuts. Private notes can be attached to any section of a document, where they also get organized on a dedicated ‘Notes’ tab.
- Collaborate on annotations and publish them with the document. Not all notes are meant to be private. One of the neatest uses of DocumentCloud I’ve seen is the ‘Show Sources’ feature ProPublica built to integrate annotations into story presentation – great example here. Think of it as showing your work.
- Extract structured data using OpenCalais. DocumentCloud processes documents with OpenCalais, a Thomson Reuters service that discovers “entities” (people, places, organizations, terms, etc) mentioned in text. You can even add custom key/value pairs to a document and then search against those. And of course, both these features are accessible through the API, making it easy to tap this incredible toolset with nothing more than a simple GET request.
- Generate a timeline from dates in a document. Did I lose you at ‘API’? Don’t worry! There’s an even easier way to use structured data extracted by OpenCalais. DocumentCloud automatically generates a timeline based on dates found in your document.
- Manage a large collection of documents. DocumentCloud is great for analyzing a single document, but it’s especially handy when you want to examine a large collection of documents (like 24,000 pages of Sarah Palin’s emails, for instance). DocumentCloud makes it easy to embed a collection, create a timeline, or extract entities from a set of documents. Combined with the product’s collaborative functionality, the result is a tool that is very helpful for newsrooms tackling major reporting projects.
If you’d like access to the tool, which is available only to journalists, email DocumentCloud and ask your editor to do the same. All they require is permission to include participating newsrooms on their list of document contributors.
Have questions? Leave them in the comments and Tom or I will get back to you ASAP!