How to create different data extraction templates for the same form? – Athento

Athento allows the configuration of multiple templates for the extraction of fields in the same form. This feature is useful for those cases in which the documents from which we want to extract the data have different structures and the fields do not always appear in the same position.

Prerequisites

For data extraction to work, the space containing the documents to be processed must have the following automation tasks active:

Extract page number.
Extract text or Extract OCR.
Classify by Fuzzy Text Similarity.

Creating extraction templates

Templates can be created from the form configuration, from the Extraction Templates tab.

Click on the "New extraction template" button. Next, the system will ask you for a name for the template. Click on "Create" to finish.

Screenshot_2022-12-22_at_11.38.53.png

Setting up classification for the template

Click on the name of the template to access it. Then click on the "Classification tab".

Next, you must indicate "Whitewords". White words are expressions that you know will appear in the text of the document and that Athento can use to classify it. Click on the "Add whiteword" button.

Then add the word or expression you want to use. You can use dots (.) as wildcards in case there are characters that may vary in the extraction:

Screenshot_2022-12-22_at_11.28.02.png

Blackwords

Athento also allows the definition of expressions that can help us to reduce false positive classification. These expressions are useful when you have templates that are very similar to each other.

With the "blackwords" we can tell Athento that if it finds one of those words in the text, do not classify the document under this template.

Setting up field extraction

In the "Template" tab, you must upload a sample document for your template.

If you make a mistake and load the wrong file, you can use the trash can button to delete it.

Then, over the template image, keeping the left mouse button pressed, draw a box over the position of the data you want to extract.

In the dropdown menu that appears below the selected area, you must indicate the name of the field to which the drawn area corresponds and click on the check icon to save your selection.

If you make a mistake, you can clear the selection by clicking on the trash can. The "<" and ">" symbols will help you open or close this menu so that you can work in your templates more comfortably.

Once you save your changes, on the right side of the screen you will see information about the location of the field.

From here you can also use some controls to work easier.

Screenshot_2022-12-22_at_11.56.35.png

Eye icon: allows you to hide the selected area above on the template. This is useful when you have two pieces of data very close to each other.
Trash can icon: allows you to delete the selected area.
Nut icon: allows you to add a default value for the field when it is sorted with this template.
User icon with the nut: this icon is only available to superusers and allows access to the advanced administration of the field.

Setting a default value for a field when it is classified with the template

You can use this feature when you know that always for the same template the value of a field does not change. For example, for an invoice from a given supplier, the supplier's identification number (CIF, VAT, RUT, etc.) will not change.

Use the nut icon to access this feature. In the dialog box that opens, set the default value that the field will take when a document is sorted with this template. Do not forget to save the changes.

Screenshot_2022-12-22_at_12.06.51.png

Extract data on pages other than the first

Use the page controller to move between the document pages and define the extraction areas as you did on the first page.

Inherit the dynamic expression of the field in the extraction template

If you defined in the field configuration a dynamic expression, by default, this setting will not be applied to the field when a document is classified with a template. The field configuration for that template will be applied.

You can copy or inherit the dynamic expression from the option shown in the following screenshot.