[OCR Advanced Settings] dialog box
This dialog box appears when you click [OCR Detailed Settings] in the [OCR Setting] dialog box or [Scan and Import] under [DocuWorks Desk Options] in the [Preferences] dialog box. It appears when using the built-in OCR.
Use this dialog box to specify the settings for OCR.
Use this dialog box to specify the settings for OCR.
[General] tab
[Recognition mode]
Select the speed of recognition and the priority of the accuracy of recognition.
The default is [Standard].
The default is [Standard].
[Recognized Characters]
[Recognition language]
Set the language type to be recognized.
The default settings are the display language for DocuWorks and [English].
The default settings are the display language for DocuWorks and [English].
Note
If the display language is set to Simplified or Traditional Chinese, both the [Simplified Chinese] and [Traditional Chinese] check boxes are selected.
[Insert Blank Characters]
This item is enabled when [Japanese] only is selected in [Language]. When this check box is selected, the OCR process inserts blank characters instead of space or tab characters in the original.
If the OCR process is configured to recognize English, it always automatically inserts blank characters between words.
By default, this check box is selected.
If the OCR process is configured to recognize English, it always automatically inserts blank characters between words.
By default, this check box is selected.
[Output alphanumerics and symbols as single-byte characters]
This item is enabled when [Japanese] only is selected in [Language]. When this check box is selected, every alphanumerics and symbols in the original are output as half-width characters.
By default, this check box is selected.
By default, this check box is selected.
[Document Layout]
[Document type]
Specify the elements forming the document.
The default setting is [Autodetect layout].
The default setting is [Autodetect layout].
[Columns]
Specify the column composing of the original to recognize.
The default setting is [Auto Detect].
The default setting is [Auto Detect].
[Show the progress of OCR processing]
Displays the process of recognizing the original.
By default, this box is selected.
By default, this box is selected.
[Specify the area and recognize]
Specifies whether to set the recognition area of pages. When the check box is selected, the [Specify Areas and Recognize] dialog box appears, in which you can specify recognition area. You can also use this option to partially change the recognized area with the auto layout analysis feature in OCR.
[Perform automatic deskew]
Deskews the document as a preparation for OCR. The deskew result is not reflected in the document.
By default, this box is selected.
By default, this box is selected.
[Output to File] tab
This tab is not displayed in the dialog box shown when you click [OCR detailed setting] in [Scan and Import] under [DocuWorks Desk Options] in the [Preferences] dialog box.
[Output the recognition results as a file]
Saves the OCR result in the file in the specified format.
By default, this box is cleared.
By default, this box is cleared.
[Output format]
Specify the file format when outputting the OCR result to the file.
The default setting is [RTF (*.rtf)].
The default setting is [RTF (*.rtf)].
[Save options]
[Image output]
Output the area specified as figure when performing OCR processing. You cannot select this item when you specified [RTF (*.rtf)] / [Excel (*.xlsx)] / [Word (*.docx)]
in [Output format].
By default, this box is selected.
By default, this box is selected.
[Reproduce the layout]
Reproduces the layout. This can be specified when you selected [RTF (*.rtf)] or [Word (*.docx)] in [Output Format]. No selecting this box outputs only plain text with keeping the character size. If the [Image output] check box is selected, images are output to the last of the page.
By default, this box is selected.
By default, this box is selected.
[Output with borders]
Outputs borders and rule lines included in the recognition results into file. This option can be specified when [RTF (*.rtf)] or [Word (*.docx)] is selected as [Output Format] and the [Reproduce the layout] check box is selected.
By default, this check box is selected.
By default, this check box is selected.
Note
Output includes underscores (underlines) or rule lines that are not defined as table area. Rule lines that are defined as table area are output regardless of whether the check box is selected.
The "table area" indicates an area that is recognized as table when the [General] tab > [Document Layout] > [Document Type] is set to [Auto Detect] or [Tables].
The "table area" indicates an area that is recognized as table when the [General] tab > [Document Layout] > [Document Type] is set to [Auto Detect] or [Tables].
[Where the file is saved]
- [Specify when OCR processing]
The [Save As] dialog box appears when OCR-processing of all pages of a document finished. Specify a folder for saving a file to.
However, if you select [Output by page] in [File Output Unit], the [Save As] dialog box appears each time a single page of OCR-processing finishes.
By default, this box is selected. - [Save As in the specified folder]
Saves the file in the folder previously specified.
The file name will be the name of the document to be processed, from which the extension is removed, followed by the extension of the specified output format.
If you select [Output by page] in [File Output Unit], each time a single page of OCR-processing finishes, the file is saved in the specified folder. The second and subsequent pages are assigned file names that are set with the options [Delimiter at the end of the document] and [Digits for number at end of document name] in [Document Operation] under [DocuWorks Desk Settings] in the [Preferences] dialog box.
[File Output Unit]
- [Output by document file]
Outputs one file per document file.
By default, this check box is selected. - [Output by page]
Outputs one file per document page.