docWorks is a software program used by the most renowned libraries in Australia (National Library of Australia) and worldwide (National Library of USA, National Library Finland, British Library, Harvard University and University of California), publishing houses and companies worldwide to digitise and convert their valuable library holdings and archives for easy access, search-ability and long-term preservation. The term “Digital Library” falls a little short in describing the actual result. “Digital” means the transformation of printed documents to digital images – ie. by searching pages and receiving JPEGs, TIFFs or PDFs. However, to create a truly searchable digital library, these images need to be “converted” into intelligent units using OCR (text recognition) and zoning (eg. the identification of different articles on a newspaper page). docWorks is the only software that bundles all necessary conversion steps in a single, smooth workflow.
- Single, smooth workflow with central control center
- Time savings due to streamlined and automated process
- No expensive errors from false copying or lost data shipments
- Consistent, standardized output
- Easily upscale, from thousands to millions of pages
How does docWorks work
docWorks “converts” scanned images. It identifies the information contained in scanned pages (such as text and structure), saves this information in an XML file and adds this file to the image. The two essential conversion steps are OCR and segmentation of the document by logical units (articles, chapters, etc). Only through OCR can a scanned page be searchable, and zoning and structure recognition ensure that only relevant search results are displayed. For instance, if no zoning/structure is applied, a multiple-word search within newspapers might display thousands of results because the single search words are being found throughout an entire newspaper page. Segmentation of the page by its different articles will ensure that the search words are found in the same article.