Retraining is the process of updating an Asset with new data obtained from both new documents and existing documents, along with their annotations. As data distributions change and new patterns emerge, retraining helps the Asset to adapt to the variations in the document sets and maintain its predictive power. This process also helps the Asset to continue performing accurately and effectively over time.
Users must have any one of the following policies to retrain an Extractor Asset:
- Administrator Policy
- Creator Policy
This guide will walk you through the steps on how to retrain an Extractor Asset.
- Upload documents
- Select the extractor asset
- Initiate retrain
- Select documents
- Annotate
- Train
- Review results and validate
- Publish the asset
Step 1: Upload documents
If you wish to retrain the extractor Asset with new documents, you must upload the appropriate documents to the Platform. Otherwise, you can skip this step and proceed to retrain with your existing documents.
For more information about uploading new documents, see Upload Documents page.
Step 2: Select the extractor asset
- Head to the Asset Studio module and select the Extractor asset you wish to retrain.
Note: Retrain is applicable only for the Extractor Asset with status as Published.
Step 3: Initiate retrain
Once you select the Published Extractor Asset from the list, you will land on the Review results page.
- On the Review Results page, click Retrain.
- In the Retrain dialog that appears, enter the Description and then click Proceed retrain.
Step 4: Select documents
This step allows you to select the documents to retrain the Asset. You can either select the newly uploaded documents or continue with the previously chosen documents used for the Asset’s initial training, along with their respective annotations.
Things to Know before selecting documents
Annotation tag : The annotation tags are visible to you to identify the documents and files that were previously selected for the annotation. You can choose to include or exclude the files you want.
Annotation Summary: The annotation summary displays information about all the fields and the number of annotated instances that are associated with the selected Document type.
Use document annotation: In the Annotation Summary, select the Use document annotation check box to continue with the previous annotations on the documents. Otherwise, the platform discards the previous annotations and allows you to perform new annotations.
Previously annotated files and folder: When you start selecting documents, you can also access the previously annotated files and folders with annotation tags. By default, the files and folders are selected.
- To exclude the previously selected/annotated files and folder, unselect the Select All checkbox.
To select the document, follow the steps below:
- In the Select Documents page, search or select the document set.
- On the main page, select the files you wish to annotate.
- You can select the files using the check boxes listed against the files.
- Use to expand the view of the listed files against a folder.
Note: Select a minimum of 10 documents to proceed with training. However, we recommend having a volume of 25 documents or more to achieve higher accuracy.
- Click Proceed to continue with the annotation process.
Step 5: Annotate
Data annotation is the process of labeling data to show the outcome you want your machine learning model to predict.
You can only annotate the Asset with the existing fields, tables and sections. You cannot add any new fields or tables while retraining the Asset.
For more information on how to annotate fields, tables and sections, see Annotate fields, Annotate a table and Annotate Section.
Step 6: Train
Once you complete the annotation, you can proceed to train the asset.
For more information on training an asset, see Train.
Step 7: Review results and validate
This step allows you to assess the Asset’s predictions, accuracy, and confidence score. Additionally, you can utilize the Validate feature to evaluate the extraction Asset’s performance on a new set of documents.
For more information about reviewing results and validation, see Review results and validate.
Step 8: Publish the asset
If the desired accuracy has been achieved, you can proceed to Publish the asset. For more information on how to publish the Asset, see the Publish the asset.
Note: To identify the retrained Asset, check its version. For example, if the initial Asset version is 1.0, the retrained Asset’s version will be 2.0. You can avail both versions in the Asset Studio.
Once the retrained Asset is published, you can download the API and its documentation. The API can be invoked independently or used within a specific use case. If you wish to consume this Asset via API, see Consume an Asset via API.
It is recommended to use URL aliases, if you wish to consume multiple versions of an Asset. It allows you to consume its different versions via a single API. For more information, see URL aliases.
You can also consume this Asset in the Asset Monitor module. For more information, see Consume an Asset via Create Transaction.