GiDoc: Description

GIDOC (Gimp-based Interactive transcription of old text DOCuments) is a computer-assisted transcription prototype for handwritten text in old documents. It is a first attempt to provide integrated support for interactive-predictive page layout analysis, text line detection and handwritten text transcription. GIDOC is built on top of the well-known GNU Image Manipulation Program (GIMP), and uses standard techniques and tools for handwritten text preprocessing and feature extraction, HMM-based image modelling, and language modelling.

GIDOC HMM-based image and language modelling are carried out with built-in software. Optionally, HMM-based image modelling can be carried out with the well-known and freely available Hidden Markov Model Toolkit (HTK), which can be downloaded from the HTK3 web site. Similarly, language modelling in GIDOC can be implemented through the SRI Language Modelling Toolkit (SRILM), which is available under an open source community license at the SRI STAR Lab.