Fine-tune an Extractor Asset

Fine-tuning is the process of adjusting and optimising a trained Asset before it is published. This involves further training an Asset using an additional document set to boost its accuracy and confidence score.

To evaluate whether fine-tuning is necessary for an Asset, you can view the Accuracy Results page after Asset training is completed, which provides an overview of both correctly and incorrectly predicted entities information. By doing this, you can identify patterns or areas where fine-tuning can potentially improve the Asset’s performance.

Note: Fine-tuning is only applicable for trained Assets before they are published, not to published Assets. If you wish to improve the performance of published Assets, you can proceed to retrain the Assets. For more information on retraining an Asset, see Retrain an Extractor Asset.

Users must have any one of the following policies to fine-tune an Extractor Asset:

Administrator Policy
Creator Policy

This guide will walk you through the steps on how to fine-tune an Extractor Asset.

Consider scenarios for fine-tuning
Upload documents
Initiate fine-tune
Select documents
Annotate and train
Review results and validate
Publish the asset

Step 1: Consider scenarios for fine-tuning

The decision to fine-tune an Asset depends on your objectives, which are often fields and document specific.

Head to the Asset Studio page and select the trained Asset that you wish to fine-tune.
In the Accuracy Result page that appears, check the Asset’s overall accuracy rate, entity level accuracy and confidence score.

Things to know

Document type: In the context of Extractor Asset, a document type refers to the category or class that a document belongs to. For example, in an Extractor Asset, the document types can be “Invoice,” “Purchase Order,” “Receipt,” “Contract,” and more. Each of these represents a distinct category of documents.

Entities: The entities refers to the fields and tables information.

Document Variation: Document variation refers to the different variations or instances within a specific document type. For example, various invoices could have different layouts, formats, or styles depending on factors like the vendor, company, or industry standards.

Overall Accuracy: The overall accuracy represents the percentage of correct predictions made by the Asset across all Entities.

Entity level Accuracy: Entity level accuracy represents the percentage of correct predictions made by the Asset for individual entities in the test document set.

Confidence score: The confidence score is a measure of how confident the Asset is in its predictions for Entity information extracted from the documents.

You can consider the fine-tuning the Asset in the following scenarios:
- To improve the overall accuracy of the Asset: Consider fine-tuning the Asset when the overall accuracy of the Asset is low.
- To improve the entity level Accuracy: Consider fine-tuning the Asset when the accuracy of certain entities is low.
- To improve the accuracy for specific document variations: Consider fine-tuning the Asset for specific document variations with low accuracy. For example, if you’re creating an Extractor Asset to extract entities from invoices, and you notice low accuracy or confidence scores for invoices from specific vendors or invoice in certain formats, then you can initiate fine-tuning.
- To improve the confidence score: Consider fine-tuning the Asset when the confidence score for certain entities or document variations is low.

Step 2: Upload documents

After identifying areas for improvement in the Asset, it is recommended to have these required document sets for fine-tuning the Asset. If you have already uploaded the documents in Document Library, skip this step and proceed to Fine-tune.

Otherwise, upload the required documents in the Document Library. For more information about uploading documents, see Upload documents.

Step 3: Initiate fine-tune

Note: It is important to be mindful that fine-tuning may also reduce the accuracy of the Asset when it is not properly performed with the appropriate document set and annotations.

On the Accuracy Result page, click Fine-tune.
In the Proceed to fine-tune window that appears, click Proceed.

Step 4: Select documents

In the Document Sets pane, select or search for the document set.
In the right page, select the required documents to fine-tune an Extractor Asset.

Note: Select a minimum of 10 documents to proceed for fine-tune. However, we recommend having a volume of 25 documents or more to provide a higher accuracy measure.

Click Proceed to annotate the documents.

Step 5: Annotate and train

Data annotation is the process of labelling data to show the outcome you want your machine learning model to predict.

For more information on how to annotate fields, tables and sections, see Annotate field, Annotate a table and Annotate Section and Group.

Step 6 : Review results and validate

This step allows you to access the Asset’s predictions, accuracy, and confidence score.

Additionally, you can utilise the Validate feature to evaluate the Extractor Asset’s performance on a new set of documents.

For more information on reviewing the results and validation, see Review results and validate.

Step 7: Publish the asset

If the desired accuracy has been achieved, you can proceed to Publish the Asset. For more information on how to publish the Asset, see Publish the asset.

Note: Once the retrained Asset is published, you can download the API and its documentation. The API can be invoked independently or used within a specific Use case. If you wish to consume this Asset via API, see Consume an Asset via API.

It is recommended to use URL Aliases, if you wish to consume multiple versions of an Asset. It allows you to consume its different versions via a single API. For more information, see URL aliases.

You can also consume this asset in the Asset Monitor module. For more information, see Consume an Asset via Create Transaction.