Near Duplicate Diff Viewer

This article provides a detailed description of all the features and options associated with comparing near duplicates in the Document Viewer.


As noted in the article on how to code documents in Reveal, Near Duplicates are documents that are substantially similar, generally evaluated at 80 or greater textual and linguistic resemblance. (This is not a percentage but a scale which may be understood along similar lines. For a description of how documents are analyzed for duplicate content, see Duplicate Status Detection.) This is useful in evaluating changes in drafts, forwarded messages, spreadsheets and other iterative documents.

The Reveal Document Viewer groups related items, such as Family documents (such as emails and attachments), Near Duplicates, Duplicates and Email Threads (as well as custom relationships as may be defined for a project) in panes next to the main viewer. The checkboxes in these panes, for Select All at the top, and document hierarchies and individual documents within them, allow reviewers to extend coding choices for the current document to selected related documents, saving review time.

237 - 00 - Near Dupe Pane

Each related document pane is headed by the currently-viewed document in bold font followed by the type of relationship and the number of related documents in parentheses. The heading is followed by an option to Select All items in the pane. In the case of Near Duplicates, there is a Diff button to the right of Select All. The Near Duplicates below will each be followed by a number in parentheses indicating the degree of similarity to the current (or first-encountered) near duplicate in the set. Any two of these documents may be selected and directly compared using the Diff button.

When you select the current document (a fixed default) and any of the listed near duplicates below and click Diff, the Diff viewer opens:

237 - 01 - Near Dupe Diff Viewer

In the above illustration we see the current document compared with a document rated at 86 in similarity. The Diff viewer, which is resizable using click and drag, will automatically try to align as much similar text as possible for the comparison. Here we see some of the email heading information (in yellow at right) added to the prior draft (grey spacer bars at left).

The Diff viewer uses colors to help highlight the comparison:

  • Yellow – Text added in later draft.
  • Red – Text removed from prior draft.
  • Lavender – Unchanged text in lines where some text has been changed.
  • Orange – Text moved or repositioned between the two drafts, to spaced yellow word highlights in later draft.
  • Grey – Spacer for lines added or removed between the documents.

The Diff viewer panels may be swapped using the Swap button to change the juxtaposition of earlier and later drafts, for example, to see more clearly what might have been removed from the later draft using that as the benchmark. There is no direct print or export feature in the Diff viewer at this time.

To close the Diff viewer, use the Close button or the X in the upper right corner of the window.

 

Last Updated 5/29/2024