Athento allows the migration of approximately 100 million documents per day; this figure may vary depending on the type of instance, infrastructure, etc.
To perform a migration of this type, the following must be done:
- Request SFTP access to the repository path where the documents will be located. Normally it will be a path such as /var/www/athentose/media/uploads/repo_XXX/team_YYYY/space_ZZZ/migration_folder.
- Using a SFTP client, upload the documents to this path. This process will take some time, but you can continue with the next step to continue the migration even if the binaries are not yet uploaded.
- In this path, locate a file called "XXX_migration.csv" where XXX must be replaced by a unique name for this migration.
- The file "XXX_migration.csv" must have the following columns:
- filepath: with the relative path to the file to be migrated within the folder "migration_folder". This field can include or not the file name.
- filename: With the name of the binary file to migrate.
- As many columns as necessary for the metadata to be migrated, using the metadata name as header.
It is recommended to use | as column separator. - In the advanced administration, go to "Bulk migration config" and add a configuration for the space where bulk migration is needed. It is enough complete the field "Serie", because the rest of them have a default value.
- At the end of the upload, a file will be generated in the migration path called "XXX_migration.csv_output.csv" with the list of documents created and the UUID generated for each one of them.
IMPORTANT: Under no circumstances should the documents deposited in the SFTP be deleted, otherwise, loss of information could occur.
Example of header for the migration CSV file
filepath|filename|metadata.contract|metadata.invoice_number|metadata.invoice_total
Recommendations
- Use UTF8 encoding for both the CSV file and the document paths.
- Do not store more than 50,000 items in a folder, use subfolders when there are more documents in each bulk upload.
- Check the format of the header, respecting upper and lower case.
- Check CSV formatting, ensuring that the separator field is not used within a field value.
- Check that document names are case-sensitive. For example, if we have a document called invoice.pdf in the SFTP in the CSV file it must be called exactly the same, it cannot be called Invoice.pdf or invoice.PDF.
Migration speed
Although Athento has performed migrations at speeds of 100,000,000 documents per day, this speed can be affected by many aspects, such as infrastructure, instance configuration, etc.
If it is necessary to migrate at a higher speed, please consult soporte@athento.com.
The speed of 100,000,000 documents per day does not include the migration time of the binaries, which will largely depend on the bandwidth during the migration via SFTP and the size of the documents.
For reference, if a migration via REST API is performed, the speed can be approximately 1,000,000 documents per day.
Metadata migration
For metadata migration, it will be necessary to use a post document migration command. This command is:
./manage.py bulk_migration_metadata --space_name=SpaceZZZ
Comments
0 comments
Please sign in to leave a comment.