Operation to extract metadata in a smart way with Athenea (OpenAI) – Athento

Athento allows you to extract metadata intelligently with Athenea. You can use this functionality with or without this operation.

Use intelligent field extraction without activating an operation.

The Extract metadata intelligently with OpenAI operation will give you more flexibility.

op_intelligent_extract_metadata.py

This operation will intelligently extract all the fields specified in the “Metadata to be extracted” parameter.

If no field is specified, all fields will be extracted.

The operation can be used in two ways:

Using the OCR or document text. This text is sent to OpenAI to obtain the values of the fields.
Sending the document pages to OpenAI.

In the first case, for the extraction to work, the text extraction/OCR operations must have been launched on the document.

The advantage of the second option is that sending the pages avoids the loss of context that can be generated by sending only the text or OCR.

Extract fields without OCR

If you do not want to use OCR, in “Extraction mode” choose “Document”.

Captura de pantalla 2024-10-03 a las 12.22.24.png

In “Pages” you can choose the pages from which the fields will be extracted. To do so, indicate the number of pages separated by commas. It is also possible to indicate a range of pages using a hyphen. For example, if you want to take into account page 1, page 3 and pages 10 to 15 you should enter:

1, 3, 10-15

More information on how to give the AI the context of what you want to extract in each field and thus obtain a more accurate extraction is available in the article Intelligent extraction of fields using Athenea (AI).

IMPORTANT: For this operation to work it is NOT necessary to configure the “Custom extract path” field for each field.

Related to

Extract fields without OCR

Related articles