Annotating a table refers to the process of adding labels to the elements within a table. This process is crucial for training an extractor asset to extract data from tables in a document, enhancing the model’s ability to accurately identify and extract relevant information. By providing clear labels, we improve the model’s understanding of the data, ensuring more effective and reliable extraction.
Users must have any one of the following policies to annotate a table:
- Administrator Policy
- Creator Policy
- Annotator Policy
This guide will walk you through the steps below in the table annotation.
Step 1: Know your document’s table format
Before you start annotating tables, you need to understand how to annotate tables based on their formats. The platform allows you to annotate the following types of tables:
- Table on a single page
- Table spanning on multiple pages
Table on a single page
A table in a single page refers to a document structure where a page contains only one table, and doesn’t exceed to another page.
Example: In the images below, the table is confined to a single page, and does not extend to subsequent pages.
Table spanning on multiple pages
A spanning table in a page refers to a document structure where a page contains a table that is too large to fit on a single page, and continues to the subsequent pages.
With this feature, users can consolidate and process multi-page tables more efficiently. When enabled, this feature will merge tables that span multiple pages, as long as they have the same number of columns.
Example: In the images below, the table appears on the first page, and continues onto the subsequent pages.
Step 2: Annotation a Table
The platform allows you to annotate the following types of tables:
- Single table annotation
- Table span annotation
Single table annotation
- In the Annotation screen, you can access the existing tables that were created in the document type against the extractor asset.
- On the right side pane, select the table label that corresponds to the table you wish to annotate on the document.
- On the document, spot the target table information and draw a bounding box around the entire table.
Drawing a Bounding box: Click and hold the mouse cursor at one corner of the table you wish to annotate. Then, drag the cursor to the opposite corner of the table while keeping the mouse button pressed. This action creates a rectangular box around the table. This is called a bounding box. - Click on the intersections of the rows and columns to create grid lines that define the table’s structure. You can resize or move the lines to accurately capture the table’s layout and content.
- Select the appropriate column headers from the drop down menu.
- Click the Done & Read Table button displayed at the bottom of the page to capture and view the extracted table data.
- If the column headers and corresponding values are not extracted as intended, click
button to remove the annotation.
- On the right side pane, click
against the table field to start annotating another table instance in the document.
- Repeat the same steps to annotate the tables in the other documents.
- Complete the annotation and click Train.
Table spanning annotation
- In the Annotation screen, you can access the existing tables that are created in the document type against the extractor asset.
- On the right side pane, select the table label that corresponds to the table you wish to annotate on the document.
- On the document page, spot the target table information and draw a bounding box around the entire table information.
Drawing a Bounding box: Click and hold the mouse cursor at one corner of the table you want to annotate. Then, drag the cursor to the opposite corner of the table while keeping the mouse button pressed. This action creates a rectangular box around the table. This is called a bounding box.
- Click on the intersections of the rows and columns to create grid lines that define the table’s structure. You can resize or move the lines to accurately capture the table’s layout.
- Select the appropriate column headers from the drop down menu.
- Click the Done & Read Table button displayed at the bottom of the page to capture and view the extracted table data.
- Go to the next page where the table continues.
- On the document, spot the target table information and draw a bounding box around the entire table.
- Repeat the same steps to annotate the table that continues on to the next page.
- Once the entire table spanning across multiple pages is annotated, click
link table from the right side pane. This is used to connect or link tables that span multiple pages, ensuring coherent and accurate extraction of data across the entire table span.
- If the column headers and corresponding values are not extracted as intended, click
button to remove the annotation.
- On the right side pane, click
against the table field to start annotating another table instance in the document.
- Complete the annotation and click Train.