It is possible to define in Athento words or expressions that serve us to classify documents. That is, a textual expression that indicates to Athento that if it is found in the OCR or text of the document, it must assign a certain form to that document.
For each form, it is possible to define a set of global white words for that form. To add global white words, use the Classification tab in the Form Settings.
If you have different classification templates for the same form, you must define white words for each of them separately. You can define white words for each template from the Classification tab of the template, entering the template.
How to pick white words?
-
White words are expressions, so you can use several words together, as well as wildcards like ?.*
-
Expressions are case sensitive.
-
The algorithm sorts the document as soon as it finds a match and each white word is evaluated individually. All other expressions are not evaluated.
-
The more restrictive is the expressions, the better. For example, if we use only the white word "CONDITIONS" to classify the particular conditions of a policy, Athento may find false matches, as could be the case with the general conditions, since the text CONDITIONS also appears in them. In this case, it is better to use "SPECIAL CONDITIONS".
-
Athento looks for an exact match of the white word, but the fuzziness parameter allows to tolerate a certain level of difference between the white word and the expression found. The fuzziness should be 0-1 when we want an exact match and a higher value if we want to allow a bigger difference between the white word and the extracted expression. Fuzziness can only be configured from the advanced backend.
-
It is good practice to observe how the expression is extracted in different documents of the same type.
-
From the option Features of a document, it is possible to see which expression has been used in the classification of a document.
What are the black words for?
Black words play the opposite role to white words and help us to prevent false positives. "GENERAL CONDITIONS" could be a black word for the case of the Special Conditions.
Comments
0 comments
Please sign in to leave a comment.