Data extraction processes do not depend solely and exclusively on the capabilities of the OCR or text extraction mechanisms of the software. In fact, they depend to a large extent on the nature of the documents from which the data is to be extracted.
For example, born-digital documents that have not been converted to images have better data extraction rates than scanned documents. Documents with smudges, watermarks or backgrounds often make data extraction more difficult. A photograph of a document is more complex to process than a scanned image because of light and perspective distortion.
Thus, the success of data extraction will depend on how much control we have over the documents from which we intend to extract information.
On the other hand, OCR technologies themselves are not 100% infallible, so even with good quality documents, there may be some margin of error.
For all these reasons, Athento cannot commit to a percentage of success in data extraction. What it will certainly be able to do is help to partially automate a process of extracting information from documents, reducing effort and time spent.
Comments
0 comments
Please sign in to leave a comment.