GiDoc First Example

A step-by-step guide is given below to train a system from a given example image. First open the sample image (it comes with the GiDoc package in the doc directory), and then do the following:

  1. Set up project preferences. Click in the "Preferences" option in the GiDoc menu (Fig 11).

    Figure 11: GiDoc menu
    Image example-menu

  2. Start a new project by clicking on the blank page icon at the upper left corner and enter its name (Fig 12).

    Figure 12: Project preferences.
    Image example-preferences-name

  3. Fill in the preferences in the Preprocessing tab as shown in Fig 13.

    Figure 13: Preprocessing preferences.
    Image example-preferences-preprocess

  4. Also fill in the preferences in the Training tab as shown in Fig. 14.

    Figure 14: Training preferences.
    Image example-preferences-training

  5. Set the fourth preferences tab, Recognition, as indicated in Fig. 15.

    Figure 15: Recognition preferences.
    Image example-preferences-recognition

  6. Once the preferences are set, you have to define the text block in the image. To do so, select the Rectangle Select Tool and create a selection containing the text block. Then click on Block Detection in the GiDoc menu (see Fig 16).

    Figure 16: Block region selected.
    Image example-block-marked

  7. Note that a new block (closed path) appears in the image. The next step is to detect text lines within that block. This is done by simply clicking on the Line Detection entry in the GiDoc menu (see Fig. 17).

    Figure 17: Line detection.
    Image example-linedetection

  8. You should obtain a good aproximation to the true text baselines, but you will have to better adjust them by using the Path Tool (see Fig .18).

    Figure 18: Corrected line detection.
    Image example-linedetection-corrected

  9. Open the interactive transcription dialog and manually transcribe a few lines (see Fig 19).

    Figure 19: Interactive transcription dialog.
    Image example-transcription

  10. Now train the system by selecting the Training entry in the GiDoc menu. If Gimp was opened from a console, the training progress output shown in Fig. 20 should be written to the console.

    Figure 20: Training progress output.
    Image example-training-output

  11. The system models have been trained and saved in the ".gidoc/example" directory. Open the transcription interface again and recongnise the blank lines remaining. To do so press "Ctrl + R" or click line number at the left (see Fig. 21).

    Figure 21: Recognition.
    Image example-transcription-recognition

giDoc Team