Executing the OCR Plug-in

This topic describes how to recognize characters in a document image page and convert them into text data by using OCR plug-in. The binders are manipulated in the same manner.

Executing OCR

Procedure

Select a document in Desk, then click the [OCR] plug-in button in the toolbar or the task toolbar. You can select multiple documents.

The [OCR] dialog box appears.

Note

[Perform OCR & Convert to Word Document] in the [Useful Features (Ver. 9)] tab on the task toolbar are enabled when you set [Word(*.docx)] for [Output format] in the [OCR Advanced Settings] dialog box.

Click [Setting].

The [OCR Setting] dialog box appears.

Specify the desired settings. If necessary, click [OCR Detailed Settings] to set specific information.

The [OCR Advanced Settings] dialog box appears.

Select the desired options and click [OK].

The [OCR Setting] dialog box appears.

Click [OK].

The [OCR] dialog box appears.

Click [Begin].

Processing starts and the progress of the processing is displayed in the [OCR] dialog box.

If you stop the OCR processing, the document that was in progress at the time you stopped the processing is restored to its original one. Any documents that have already been processed are not restored to their original state.

When you select [Specify the area and recognize] in the [OCR Advanced Settings] dialog box, the [Specify Areas and Recognize] dialog box appears. Click [Layout Analysis] to run an auto layout analysis. Alternatively, set the target area manually, and click [Start].

If [Show the progress of OCR processing] is selected in the [OCR Advanced Settings] dialog box, the following dialog box appears to show the process of recognition.

Note

You can also execute the OCR processing for documents or binders in a read-only folder.
If [Perform preprocessing only (Do not perform OCR process)] is cleared and [Automatically Rotate Page] is selected, pages with annotations will not be rotated nor OCR will be performed. And the next page is processed.
You cannot manipulate documents in Desk during the OCR processing.
The previous recognition result is overwritten by the new one and is discarded.
The maximum number of characters that can be recognized in a process is 20,000 per page. If you try to process more than 20,000 characters, an error occurs and the process aborts. In this case, you may be able to bypass the error by reducing noises or by excluding images or noises using the [Specify Areas and Recognize] dialog box before the OCR processing.
A message appears if an error occurs during the processing. Depending on the error, you may cancel the processing or move to the next page or document. Information on pages or documents that were skipped appears in [State] in the [OCR] dialog box after all processing is completed.
The processing will be aborted if:
- the disk free space is inadequate
- memory is inadequate
- an OCR software error occurs
The process will move to the next page or process document if:

the document is write-inhibit.
annotation is attached on the page
(only when [Automatically Rotate Page] is selected).
the document is a DocuWorks document for which document editing and copying are prohibited.
the document is protected by a password.
the document is a PDF portfolio.
the document is neither a DocuWorks file nor a PDF document.

Specifying a recognition area

You can specify the areas to be OCR processed.
Select [Specify the area and recognize] in the [OCR Advanced Settings] dialog box, and start the OCR plug-in.

Procedure

When you start the OCR plug-in, the [Specify Areas and Recognize] dialog box appears.
To specify the recognition area automatically, click [Layout Analysis].

A rectangular frame appears automatically on the displayed document image.
To specify the recognition area manually, drag the mouse to draw a rectangular frame on the image.
After creating the frame automatically or manually, you can move or resize it by selecting and dragging it.

Click [Start].

Saving a recognition result

Select [Output the recognition results as a file] in the [Output to File] tab of the [OCR Advanced Settings] dialog box, and start the OCR plug-in.
The file formats that can be saved are text, RTF, Excel, CSV and Word.