This article provides a detailed description of the workflow for preparation and use of translated and transcribed text sets.
Introduction
Currently, Reveal Analytics only takes in OCR or Extracted text by default. If documents were already ingested into Reveal Analytics but a client wants to use a translated or transcribed text set, the steps below provide general guidance.
Workflow Process Flow
Workflow Steps
1. Prepare Data
Please search and find target documents you want to translate or transcribe, put them in a work folder, and run a translation or transcription job.
Refer to the KB article below on how to run translation jobs.
Translate (Translation) (revealdata.com)
Refer to the KB article below on how to run transcription jobs.
Transcribe (Transcription) (revealdata.com)
2. Backup OCR Text Path (Optional)
- To preserve the original non-translated text, create a new transcription text set in Review Manager (for example: “Original OCR Text”).
Refer to the KB article below on how to create document text sets.
Create Document Text Sets (revealdata.com) - Go to Fields > Manage Fields and choose the Field Profile you added to the New OCR field. Select the New OCR field and change it to Updateable = Yes.
Refer to the KB article below on how to create an updatable field.
Manage Fields (Updatable) (revealdata.com) - Add the OCR Path and New OCR Text Set fields to the Field Profile above.
- Go to the work folder with the applicable documents to focus only on the target documents.
- Use the Upload function and choose Include Fields; you should see the New OCR field as part of the choices from the Field Profile.
- Use the Copy Field option to select OCR Path and continue to finish the update. This should copy the Path over to the new OCR Text set.
Refer to the KB article below on how to bulk-tag documents.
Bulk Tag Document (Bulk Update) (revealdata.com)
3. Replace OCR Path
- Use the work folder containing the applicable documents.
- Add the OCR Path field and Translation Path fields to a Field Profile.
- Go to Fields > Manage Fields and choose the Field Profile you added to the OCR Path. Select the OCR Path and change to Updateable = Yes.
- Select the set of records you would like to update.
- Use the Upload function and choose fields; you should see the OCR Path as part of the choices from the Field Profile.
- Use the Copy Field option to select Raw Transcription OR Translation.
This should copy the Translated/Transcribed Text Path over to OCR.
Refer to the KB article below on how to bulk-tag documents.
Bulk Tag Document (Bulk Update) (revealdata.com)
4. Re-Index
- Using the work folder above, select Index and re-index for OCR Text Set.
Refer to the KB article below on indexing data:
How to index data (revealdata.com)
5. AI Sync Job
The next step is to verify the completion of the AI Sync Job. Generally, the AI Sync job will start within an hour after the index is complete.
Refer to the KB article below on the Analytics Sync Process on how to check the status of an AI sync job.
AI Document Sync (Analytics Sync Process) (revealdata.com)
6. Run Full Process
Once the index and the AI Sync Job are completed, submit the classification by clicking Run Full Process on the classifier details page (shown here in Dark Mode). This will force the classifier to run with the data built from the translated text set.
7. Confirm the Classifiers
- Confirm the Classifier has returned to a ready state and completed at least one additional training round.
- Additional confirmation: (Optional) Before submitting “Run Full Process,” export existing features from the classifier to a CSV file. After submitting and confirming that the classifier has finished the new training round, compare updated features with the exported CSV file and verify that new features based on translated text exist.
8. Index Backed Up OCR Path (Optional)
If a backup was created for the original OCR Text Set in Workflow Step (2) above, we can index the original OCR text for searching.
- Use the work folder above, select Index, and re-index for “Original OCR Text” created in Step (2) above.
- Once indexing is complete, confirm that the OCR Text Sets should be available in the document viewer.
Last Updated 5/16/2024