This article provides a detailed description of the handling hidden content by Processing in application filetypes.
Microsoft Word, Excel and PowerPoint, as well as Portable Document File (PDF) rendered files, have metadata and other hidden content that can be useful in review. Below is a table of the types of hidden content handled by Reveal Processing, with notes on each hidden content type.
Name |
Word |
Excel |
PowerPoint |
|
---|---|---|---|---|
Audio and Video Paths |
X |
|||
Author History |
X |
|||
Color Obfuscated Text |
X |
X |
X |
|
Comments |
X |
X |
X |
|
Database Queries |
X |
X |
||
Embedded Objects |
X |
X |
X |
X |
Encryption |
X |
X |
X |
X |
Extreme Cells |
X |
|||
Extreme Objects |
X |
X |
||
Fast Save Data |
X |
X |
X |
|
Hidden Cells |
X |
|||
Hidden Slides |
X |
|||
Hidden Text |
X |
|||
Linked Objects |
X |
X |
X |
|
Macros and Code |
X |
X |
X |
X |
Meeting Minutes |
X |
|||
Presentation Notes |
X |
|||
Routing Slip |
X |
X |
X |
|
Sensitive Hyperlinks |
X |
X |
X |
X |
Sensitive INCLUDE Fields |
X |
|||
Tracked Changes |
X |
X |
||
Versions |
X |
|||
Weak Protections |
X |
X |
Audio and Video Paths
-
Description – Microsoft PowerPoint supports linking to audio and video files using the Insert > Movies and Sounds > Movie from File and Insert > Movies and Sounds > Sound from File commands. Use of this feature results in storing a potentially sensitive link to a local or network file path.
-
Risk – The storage of an external local or network file path caused by linking to audio and video files exposes an organization to multiple risks. The first risk is that sensitive information may be contained in the directory hierarchy exposed by the path. For example, the directory structure may use a taxonomy that includes information such as a client’s name or identifier. The second risk is that the path information can provide a view into the corporate network topology. This opens an organization to a network intrusion risk. While this risk is mitigated by proper network security, it remains a social engineering threat by providing confidential information to hackers attempting to infiltrate a corporate network. The social engineering risk is elevated when path information is combined with other sensitive data like valid user names, email addresses, and email subject lines.
Author History
-
Description – Up to the last 10 authors that saved the file are stored in an area of the file that is inaccessible using the Word application. In Word 97 and Word 2000 this information also contains the paths where the file was saved and may include sensitive user logon or network share information.
-
Risk – The saving of the author history within Microsoft Word files poses several risks including exposure of personal information, local or network paths, and an audit trail of previous revisions. Personal information will typically include the user names associated with the last 10 revisions of the file. Local or network paths will identify where each revision was saved, opening the risks associated with exposing file paths. The combination of user names and file paths provides an audit trail of previous revisions that may not be desirable. The risk associated with exposing this information often depends on the type of file being considered and the potential reviewers of the file. For example, files that may be targets of legal discovery and files that may be published to the web pose a higher risk than other files.
Color Obfuscated Text
-
Description – The font color of some file text closely matches the background color of the text, resulting in text that is not visible in the authoring application. This feature targets the more common ways to obfuscate text by setting the text color to match a solid background color and includes consideration for numerous cases where the background is inherited from underlying objects. Complex backgrounds that include underlying images, objects, shapes, and transparency may inadvertently generate false positives and false negatives.
-
Risk – Making a font color closely match the background color can result in certain text being obfuscated to casual readers of a file. This may occur accidentally or be used as means to hide text at various points in the file life cycle and may result in the unintentional disclosure of information.
Comments
-
Description – Microsoft Office supports adding user comments to a file through the 'Insert > Comment' command. Comments often contain private or sensitive information.
-
Risk – File comments may be used to expand upon or clarify visible content and pose low risk when used in this manner. However, comments are also often used for internal commentary and collaboration. In this form they can expose sensitive discussions, and if released may represent a leak of information that was not intended. The severity of the threat is highly dependent on the content of the comments.
Database Queries
-
Description – Microsoft Office supports powerful connectivity to databases that results in database connection and query information being stored in Office files. This information may include a path or URL to a database server, the database username, database password and SQL query strings, all of which can be highly sensitive information.
-
Risk – The use of database queries to bring external data into Excel is a powerful feature that comes with several serious security risks. Specifically, this feature creates the potential that unauthorized users will be able to independently query a sensitive database at will. To allow the query to be updated, whether user initiated or automatic, the file retains the database query parameters. This information may include a file path or URL reference to the database server, SQL query strings that identify the requested data, and the password required to access the database. A file path to the database server opens all of the security threats associated with exposing file paths. SQL query strings can be used to infer the structure of the database. Storing the database password in the Office file is a setting the user may choose when creating the query. This setting is often activated to avoid having to re-enter the password each time the data is updated. This information opens an organization to SQL injection attacks. Proper network security may prevent any external access to the database server but this provides little peace of mind in the event of a network security breach. Internal access, however, may represent an even greater threat since the recipients of the sensitive information are likely behind the firewall but possibly prohibited from accessing the database. Consider an example where the finance department distributes a spreadsheet that at face value simply includes a list of employees by department, but buried within the underlying query lies all the information required to access an employee database filled with confidential data. Extreme caution should be used when releasing spreadsheets that contain database queries.
Embedded Objects
-
Description – The Office embedded object feature (Insert > Object...) allows embedding an object into the file that is created and served by another application. The resulting object data may then contain any of the hidden and sensitive data issues found in the serving application. Adobe PDF files may include attached files through the embedded files feature of the PDF format. Files embedded in a PDF file are detected under this analysis setting.
-
Risk – Office applications leverage embeddings to seamlessly work with each other as well as with other applications to create compound files. Including a spreadsheet table in a Word file or a chart in a presentation is common and useful. In order for any application to allow an embedding to be edited in its native application, the primary file includes a complete copy of the application data associated with the object. This data is in addition to the graphic rendition of the object that is used for display and printing. It is in this data that security risks can be found. Any security threat that has been identified in files created by an application can also manifest itself when that application serves an embedding. An additional security concern has been found to exist when using embeddings within file that have been encrypted using the Office security settings. Surprisingly, embedded objects are not encrypted along with the primary file. For example, if an Excel chart is added to a Word file that is then encrypted using Word’s security settings, the chart and the entire supporting spreadsheet will be left unencrypted within the Word file. Scrubbing embeddings will remove the ability to make further edits to the embedding while maintaining the most recent graphic rendition of the object. Adobe PDF files include a feature defined as embedded files that are detected with this file. Files embedded within a PDF file carry a risk because they can also be automatically launched via actions that can be attached to form fields and other automated actions.
Encryption
-
Description – The file is encrypted and most analysis and scrubbing requests cannot be accomplished. This is distinguished from ScrubOptions.WeakProtection in that it cannot be easily circumvented short of brute force or dictionary based password attacks. However, using the Microsoft Office encryption feature (Tools > Options > Security > Password to open) does not encrypt the entire file, potentially leaving file properties and embeddings into Word and Excel unencrypted. Both Office and PDF files can be encrypted with a default password. Clean Content will test the default password and decrypt the file when used on PowerPoint and PDF files.
-
Risk – Encrypting files using the Microsoft Office security settings can provide a strong level of security against unauthorized access to files. However, this form of encryption does not always safeguard the entire content of the file. Specifically, file properties and embeddings can remain unencrypted, leaving the unsuspecting author vulnerable to unexpected information exposure. Additionally, issues with the Office encryption implementation have been published and reported to Microsoft. It can be expected that Microsoft will continue to address any holes in this area with patch releases to some versions of Office. It can also be expected that existing files and non-patched versions of Office will continue to propagate these problems. The security threat posed by partially encrypted and poorly encrypted files is based heavily on the file content and can range from low to very high.
Extreme Cells
-
Description – The Extreme Cells target indicates that ranges of spreadsheet cells within the file are located an extreme distance from other cell ranges. The definition of an extreme cell range can be controlled by setting two settings; Extreme Cell Horizontal Gap Allowance and Extreme Cell Vertical Gap Allowance.
-
Risk – Extreme cell content may not be readily visible to casual readers of a file. This may occur accidentally or be used as a means to hide text at various points in the file life cycle and may result in the unintentional disclosure of information.
Extreme Objects
-
Description – The Extreme Objects target identifies embedded, linked and graphic objects that have been positioned in such a way that a majority of the object may fall outside the reasonable viewing area when viewed or printed in the authoring application. This may include objects positioned outside the slide or speaker note frame in PowerPoint, and in an extreme cell range in Excel files. Extreme objects are reported but modifications can only be made upon author review in the authoring application.
-
Risk – Extreme objects may not be readily visible to casual readers of a file. This may occur accidentally or be used as a means to hide embeddings at various points in the file life cycle and may result in the unintentional disclosure of information. Objects embedded into Excel spreadsheets may be considered extreme if the object is bound to cells that are located in an extreme cell range as defined by the Extreme Cells target. Note that such an object will trigger both an Extreme Object and an Extreme Cell notification. Objects embedded into Excel spreadsheets may be considered extreme if the object is bound to cells that are located in an extreme cell range as defined by the Extreme Cells target. Note that such an object will trigger both an Extreme Object and an Extreme Cell notification. Objects embedded into PowerPoint presentations may be considered extreme if 50% of the bounding rectangle of the embedding is positioned outside of the slide or speaker note frame.
Fast Save Data
-
Description – The fast save feature in Microsoft Word and PowerPoint is set using the Tools > Options > Save > Allow fast saves command. When fast save is activated deleted text and data can remain in the file even though it is no longer visible or accessible from within the application. Adobe PDF files may also include earlier revisions of nearly any type of content through the Incremental Update feature of the file format.
-
Risk – The fast save feature of Microsoft Word and PowerPoint is designed to decrease the time required to save a file to disk. This is accomplished by attaching changes to the end of the existing file rather than completely rewriting the modified file. Unfortunately, this will result in leaving deleted text and data in the file long after it was apparently removed by the user. This creates the risk of exposing the previous state of a file to recipients. A second risk is that this feature of Office can be used to transfer confidential information through files in a way that will circumvent most content filtering technologies. The occurrence of this feature in Word files is low because the Fast Save setting was turned off by default, with the release of Office 2000, though upgrading Office in place may maintain the state of this setting. This risk remains a threat in existing, pre-Office 2000 Word files. This feature is still on by default as of the current release of Microsoft PowerPoint. As a result, it is common for PowerPoint files to include multiple prior versions. This is particularly concerning when considering the frequency with which pre-existing presentations are modified for a slightly different audience. Imagine the risk of distributing a sales presentation to one prospect that was given earlier to another prospect, knowing that the prior version is buried somewhere in the file. Adobe PDF files include a similar feature known as Incremental Updates that is detected under this setting due to its similarity to fast save. The fast save feature is enabled by default in Word 97, enabled by default in Word 2000 if it was upgraded from Word 97 and disabled by default in new installations of Word 2000 and above. It can be enabled by the user in all versions of Word. The fast save feature is enabled by default in all versions of PowerPoint and results in many versions of modified slides remaining in the file. The incremental update feature of Adobe PDF may be implemented by PDF generation tools that make modifications to an existing PDF file.
Hidden Cells
-
Description – Spreadsheet rows, columns, or worksheets that have been hidden. Hidden cells may contain sensitive data that requires user review prior to release. Hidden cells can be identified during analysis and can be made visible by setting the Unhide Hidden Cells setting. Hidden cells are not deleted or cleared when cleaned since they may be required to resolve references from visible cells.
-
Risk – It is common for spreadsheets to include entire columns, rows, or even sheets of data that are hidden from view. This is often done to prevent recipients from accessing sensitive information. The hidden data might be necessary to support a less sensitive calculation or chart. For example, a sheet of employee salaries may support a chart that shows relative salary expense by department. The salary data is sensitive but the chart is not. Unfortunately, simply hiding the cells does not safeguard access to the data since recipients can simply unhide the cells. Using sheet protection with a password is a common approach to prevent recipients from accessing hidden cells. However, this safeguard is a weak form of protection because the feature does not encrypt the underlying hidden data and can be easily disabled by hacking a few bytes in the file. Workbook and file level security settings with passwords can be used to prevent modifications and encrypt the underlying data thus providing stronger security. Consequently, hiding cells within unencrypted files should never be considered a secure method of preventing unauthorized access to those cells. Due to the fact that hidden cells may support visible cell calculations, removing hidden cells requires modification by the user directly within the application.
Hidden Slides
-
Description – The PowerPoint hidden slide feature (Slide Show > Hide Slide) allows individual slides to be hidden during the slide show and printing of the presentation. Hidden slides may contain information that is not intended for general release.
-
Risk – Hidden slides are often used to tailor a presentation to a particular audience or to adjust a presentation to meet a required time allotment. In many cases, exposing the hidden slides does not represent any type of privacy or security concern. In some cases, however, the hidden slide may contain data not intended for the target audience, creating a risk of leaking sensitive information. Any presentation that contains hidden slides should be reviewed prior to distribution to determine whether the slide should be removed.
Hidden Text
-
Description – Text that has been intentionally hidden (Format > Font... > Font > Hidden) by the user may contain sensitive information that should be reviewed or removed before distributing the file.
-
Risk – The use of hidden text exposes the author to unintended information disclosure. Hidden text may be used for internal commentary, temporary display and print removal, or as a method of deleting text so that it can be retrieved later if desired. It is less common to find hidden text that provides intended useful content because this is usually done with comments. Releasing files that contain hidden text to third parties is considered a high security risk when not first reviewed by the author.
Linked Objects
-
Description – The Office linked object feature (Insert > Object...) allows linking to an external file that is managed and rendered by another application. These links can expose local and network path information.
-
Risk – Office applications enable the primary file to include references to external files that are then rendered directly into the primary file. Using this feature stores a file path or URL to the external file within the primary file. This is done to allow automatic updates to the primary file that incorporate changes to the linked file, and to allow direct authoring of the external file within the primary file framework. The existence of path information that supports this feature opens an organization to network intrusion and social engineering risks. Removing the link information can be done without affecting the most recent rendering of the linked object.
Macros and Code
-
Description – Microsoft Office includes support for Visual Basic and can be used to create everything from simple macros to data entry forms to full blown applications. Visual Basic can also be used to create macro viruses that travel with files. Adobe PDF files may contain code in the form of Java Script.
-
Risk – The risk associated with macros and code being present within inbound files is a well known virus threat. The risk associated with outbound files includes the unintended redistribution of viruses and the potential disclosure of sensitive information contained within an otherwise valid macro. Information disclosure can come in the form of user names, code comments, and potentially confidential approaches to programmatically accessing corporate resources. Macros and code are often used to support the file creation process but are not intended or desired in the final version of the file. In other examples, macros and code provide important and useful functions to the recipient as might be the case with controls and forms. Determining the risk associated with releasing files that contain macros and code typically requires user review. Adobe PDF files may contain code in the form of Javascript.
Meeting Minutes
-
Description – Meeting minutes can be attached to PowerPoint files with the PowerPoint Meeting Minder feature and are typically associated with an action item list. The action item list is included in the presentation as part of a slide or series of slides. The associated minutes are accessible only through the Meeting Minder user interface.
-
Risk – Meeting minutes may be unexpectedly released with a presentation because the minutes are not displayed as part of any slide but instead require manual review of the Meeting Minder minutes and may therefore be overlooked during review.
Presentation Notes
-
Description – The PowerPoint notes feature allows notes to be associated with each slide. Notes may contain general content or internal commentary that should be reviewed or removed prior to distributing a presentation.
-
Risk – Presentation notes, also referred to as speaker notes, are commonly used to file specific points the speaker would like to make during the presentation. In most cases these notes represent useful additional content that can be safely shared with any recipient of the presentation file. Oftentimes, however, these notes are written in a style that is targeted at the speaker alone and are not intended to be directly shared with the audience. In other cases, the notes are used to facilitate collaboration between multiple authors or reviewers working on the presentation. Distributing or publishing a presentation that includes speaker notes carries the risk of disclosing unintended or even confidential information.
Routing Slip
-
Description – The email routing feature of Microsoft Office (File > Send To > Routing Recipient) stores the email addresses and user names of recipients in the file.
-
Risk – Email routing slips are introduced into files that enable the file routing feature. Each routing slip may contain the email display name and email address of the originator and all recipients of the routed file. The routing slip can also contain the subject line, message body, and the date and time stamp of the routing email. This information will remain in the file after it has been routed and can expose an organization to the release of sensitive information. This exposure may be of particular concern with files that are a target of legal discovery and files that are made available to the public via electronic distribution or publication.
Sensitive Hyperlinks
-
Description – The Office hyperlink feature (Insert->Hyperlink) allows the creation of links to various locations. Two of the possibilities, fully qualified local paths and network paths, can provide unwanted insight into an organization's internal structure. Web links are not treated as sensitive.
-
Risk – Sensitive hyperlinks are hyperlinks to a resource located on a local or network drive. As such, they carry the risks associated with exposing path information. This includes the release of confidential network topology information and sensitive directory naming conventions. Releasing network resource names can subject an organization to network security risks through direct intrusion attempts and through social engineering attacks.
Sensitive INCLUDE Fields
-
Description – The Microsoft Word INCLUDE field feature provides non-OLE based linking to external files (Insert > Field->IncludeText and Insert > Field > IncludePicture). These fields may contain fully qualified local paths or network paths.
-
Risk – Sensitive INCLUDE fields carry the risk of exposing sensitive local and network file paths which can provide insight into an organization's internal network structure. The release of path information carries the risks of network intrusion, sensitive information exposure, and social engineering threats.
Tracked Changes
-
Description – The change tracking feature of Microsoft Office tracks insertions, deletions and formatting changes made to the file. Such changes contain deleted text and author and date information that may be unintentionally left in the file upon distribution.
-
Risk – Tracking changes in files is a powerful feature that enhances the collaboration process by providing valuable change history. It can be useful for individual authoring and indispensable when multiple authors and reviewers are involved. But a very high information disclosure risk comes with this power. Files often reach points in their lifecycle where tracked changes should either be accepted or rejected and a clean version of the file should be saved. This is required when it is no longer desirable to share the history of deletions and additions with the next group of recipients of the file. Many organizations have experienced the fallout associated with releasing a file with change tracking still enabled. The results can range from embarrassing to adversely affecting business, and, depending on the sensitivity of the content, can even be used to support evidence discovery for litigation.
Versions
-
Description – The versioning feature (File > Versions) in Microsoft Word allows multiple historical versions of a file to be saved within a single file. Versioning is useful during file creation but potentially sensitive once a file is released.
-
Risk – The version feature of Microsoft Word carries with it a high risk of unintended information disclosure. This feature allows the author to archive the current state of a file into the file so that it can be extracted at a later time if required. Users that rely upon this feature as a form of version control run the risk of accidentally releasing older versions of the file that are not intended to be viewed by the recipient. The severity of this threat is heavily dependent on the sensitivity of the file content.
Weak Protections
-
Description – Weak protections are features of an application that appear to provide a strong level of protection against specific user actions on the file but in fact can be easily removed from the file without access to a password. A protection is only considered weak if it requires a password to remove the protection. Protections that don't require passwords are considered simple but not weak since they don't imply any additional password based strength.
-
Risk – Weak protections carry the risk of leading the user to believe that controls placed on the file are safely protected when they are not. The weakness lies in the fact that because the file is not encrypted, the protection can be easily disabled by hacking the file to overwrite or clear the protection commands. Since these features do not attempt to modify the viewing of a file, they don’t pose any direct information disclosure threats. However, if the protection is removed the user will have access to more features that may indirectly expose additional information. An example of this risk occurs when assuming that a spreadsheet which includes sheet protection will effectively prevent recipients from examining hidden cells. Once sheet protection is removed the user will then be able to unhide the cells and expose potentially sensitive information. The Microsoft Word protection features (Tools > Options... > Security > Password to modify) and (Tools > Protect Document... > Password (optional)) are weak protections because they do not result in encrypting the file and are easily circumvented with minor changes to the underlying file. The Microsoft Excel 97 thru 2003 protection features (Tools > Options... > Security > Password to modify) and (Tools > Protection > Protect Sheet... > Password to unprotect sheet) are weak protections because they do not result in encrypting the file and are easily circumvented with minor changes to the underlying file. The Microsoft Excel 2007 and above protection features (Save As > Tools > General Options ... > Password to modify) and (Review > Protect Sheet... > Password to unprotect sheet) are weak protections because they do not result in encrypting the file and are easily circumvented with minor changes to the underlying file.
Last Updated 9/08/2021