This article provides both visual and written instructions for building AI models from CSV files
Introduction
An AI model is a software program that has been trained on a set of data to perform specific tasks such as recognizing certain patterns. Artificial intelligence models use decision-making algorithms to learn from the training and data and apply that learning to achieve specific pre-defined objectives.
Reveal AI supports model building in various ways. To help speed up the model building process, users can use a keyword list, or export a CSV file from previous cases to build the initial model. This document provides step-by-step instructions on how to create models using these two methods.
Process Flow
Workflow Steps
Reveal supports importing models directly from a CSV file. The CSV file can be the one exported from a Brainspace portable model, or you can create one manually. If you are creating one manually, make sure the CSV file contains the following two columns:
- term
- weight
Note: The column names and terms need to be in lower case.
1. Collect Terms and Weights
There are two ways to collect the materials that can be used to build an AI model.
- Use a self-generated keywords list. This could be a keyword list used by review team to search for relevant documents.
Tip: Wildcard and proximity searching terms are not supported, so convert them to straight keywords before using them in the CSV file. - Export the features from an existing classifier in Reveal:
- Go to the Features tab in Classifier Details, click Download to download the CSV file:
- Go to the Features tab in Classifier Details, click Download to download the CSV file:
Note: The downloaded CSV will contain extra columns and will need to be updated before we can use it directly. See steps below for more details.
Raw Feature |
Feature |
Type |
Model Weight |
toni graham |
toni graham |
keyword |
0.05834 |
mark broadfoot |
mark broadfoot |
keyword |
0.03369 |
broadfoot |
broadfoot |
keyword |
0.03256 |
mins |
mins |
keyword |
0.03174 |
resume |
resume |
keyword |
0.0301 |
sgrady |
sgrady |
document author |
0.02756 |
cliff baxter |
cliff baxter |
keyword |
0.02646 |
don |
don |
keyword |
0.0262 |
phillip k.allen |
phillip k.allen |
keyword |
0.02536 |
phillip k. |
phillip k. |
keyword |
0.02524 |
2. Update and fine-tuning
Once the CSV file is created, we can then add the term and weights as desired. There are two types of terms, (1) key word terms and (2) metadata features.
- Keyword: supports multiple words, for example “natural gas” can be one term.
- Metadata: lists the metadata field name and value in the following format:
[“field”:”value”]
For example, the following metadata term is for metadata field “From”:
["SENDER","kaminski <kaminski>"]
To view a list of possible metadata fields, see the If you want to search this field...use this name in the RQL Query table in the link below; the 2nd column can be used as fields:
https://www.revealdata.com/knowledgebase/reveal/reveal-query-language-basics
We recommend the weights assigned to each term be in the range of -0.99 to 0.99, with at least one positive and one negative entry.
Here is an example:
term,weight
takata,0.5
airbag,0.9
recall,0.4
airbags,0.3
automaker,0.3
faulty,0.3
inflators,0.2
["SENDER","kaminski <kaminski>"]:0.8
window,-0.1
working,-0.1
yesterday,-0.4
3. Import into Model Library
The CSV file can be imported into the model library using the IMPORT MODEL button below:
Browse to the CSV file and import it to the model library.
Once it is imported, click the Settings button and update name/description:
The saved model can then be applied to a classifier in the same way other library models are.
4. Test and Validating
Follow the steps below to Test and Validate the new model:
- Create a new classifier in a new case.
- Go to the Edit Classifier
- Add the imported model to the classifier under AI Library Models.
- Either Run Full Process using the button at the bottom of the AI Library Models section after selecting, or go to Classifier Details and click Run Full Process.
- Confirm scores are populated after classifier finishes.
Last Updated 6/12/2024