/ Insights

An Inside View of Office Document Cache Exploitation

April 9th, 2020
Joakim Schicht

When we began our initial development of ODC Recon, we were aware that multiple versions of Microsoft Office documents could be extracted from FSD files found within Office Document Cache (on many kinds of devices, not just Windows workstations!) when OneDrive or SharePoint were in use. You can read more about Office Document Cache (ODC) and ODC Recon development here. As our development has continued, we are now aware of even more granular information related to particular edits, beyond the entire document versions which our customers have been using ODC Recon to extract from FSD files.

When sharing Microsoft Office documents from OneDrive or SharePoint (hereafter we will simply refer to OneDrive), there are mechanisms active that facilitate collaboration features such as real-time co-authoring. These mechanisms (we refer to them as the “collaboration service”) are also responsible for the storage of granular revision information in the ODC.

Granular Revision Information

So where can this granular revision information be found within the ODC? Assuming that desktop versions of Office were used to interact with a document shared from OneDrive, in two places:

  • Within the FSD file*1 associated with a OneDrive-shared document on each collaborator’s workstation

  • Within the ODC’s temporary collaboration data on each collaborator’s workstation

These two places are not always equal, and you will sometimes find granular revision information which does not exist within FSD files but only within temporary collaboration data.

* 1 For Excel documents, the granular revision information is actually found outside of the document versions inside the FSD file… but still within the FSD file. For PowerPoint documents, it is found within the document versions found inside the FSD file and thus within the document itself – think about the possibilities in terms of PowerPoint documents which have been copied to other locations from OneDrive. For Word, we will get back to you.

Before we get into the weeds, let’s establish why you might care about this more granular information related to particular edits even if you already use ODC Recon to extract entire document versions from FSD files within ODC. As a digital forensics practitioner, of course you would want to see particular edits to a document, especially if those revisions cannot be found in either a document in a forensic image (because it doesn’t “live” on the disk, it lives on OneDrive) or versions of it extracted from its associated FSD files in the forensic image… but there is more you should be aware of. As I mentioned above, this granular revision information can be found both within FSD files (in the case of Excel, apart from the document versions also contained within them) and within temporary collaboration data. The more granular revision information we are discussing here includes not only timestamps, but usernames (depending on the circumstances, collaborators other than the FSD file’s owner may be identified as “Guest User”) and user IDs – so if multiple users were collaborating on a document, and one of them replaced an important name in a document, you may be able to determine not only what exactly was replaced but when and by whom.

Once you think through the implications of what can be done not only with multiple document versions extracted from FSD files as ODC Recon has always done, but what can be done with the granular revision information that can be found within FSD files and temporary collaboration data, you should be having a “lean back in the chair” moment.

Please note that ODC Recon currently extracts granular revision information from FSD files. We will now expose you to our ongoing research and development related to the extraction of granular revision information from temporary collaboration data.

Granular Revision Information within Temporary Collaboration Data

Now, let’s work our way into the weeds in terms of granular revision information found within temporary collaboration data.

Office stores temporary data related to the collaboration service (i.e. temporary collaboration data) by creating a “DS_(X)” subfolder in the ODC for each particular document. This data is merged back into the document’s associated FSD at currently unknown intervals. Accordingly, the content of an FSD file associated with a particular document may or may not contain the latest revision data when that document was opened by multiple users.

More specifically, the folder path within the ODC that we are referring to is:

C:\Users\(Username)\AppData\Local\Microsoft\Office\16.0\OfficeFi

The DS_(X) naming convention is:

 DS_0 = Word
 DS_1 = Excel
 DS_3 = Powerpoint

Here is an actual example of a full path to a particular Excel document (in blue), and a particular revision of that document (in red), within the ODC’s temporary collaboration data:

C:\Users\Joakim\AppData\Local\Microsoft\Office\16.0\OfficeFileCache\DS_1\0\0\DCCDDJCVHSGBCKETCNHZDIDQB5CUFDDE\e_CCE3CTFAAUDAAIAQECHUEEAAG5AVDDD6

As you can see above, the subfolder related to a particular document is located 2 levels down from the two folders named “0”, and the particular document folder name and particular revision file names seem obfuscated in some way.

The format of ODC temporary collaboration data differs significantly between Excel, PowerPoint, and Word documents. Our latest R&D has been largely focused on extracting granular revision information from Excel documents, as the merger of that data with FSD files appears to not only be “cached” longer (making the data even more compelling to digital forensics practitioners), but the format of Excel’s temporary collaboration data is the most sane. Here is a quick summary of where we are in terms of reversing these various formats of temporary collaboration data:

Word (DS_0)

The format of Word’s temporary collaboration data seems relatively complex compared to Excel and PowerPoint. It appears to involve interaction with multiple binary file formats. We are still working on how to best extract granular revision information from this data.

Excel (DS_1)

The format of Excel’s temporary collaboration data is relatively straightforward, as it uses temporary xml files from the original document. However (isn’t there always a “however” in digital forensics?), the files are stored in multiple ways and with obfuscated file names. The content of these files seems to include either compressed xml, uncompressed xml, or various unknown binary formats.

The actual revisions are found within the temporary collaboration data file that effectively contains the revStreamX.xml. All revStreamX.xml files are indexed and referenced from revIndex.xml and revIndex.xml.rels. All of these xml files are normally stored within an Excel document’s FSD file, but separately from the actual document content. When temporary collaboration data files are created within DS_1, various files from the original documents’s ooXML structure are created in addition to these rev*.xml files.

 PowerPoint (DS_3)

The format of PowerPoint’s temporary collaboration data also uses temporary xml files similar to Excel, but slightly differently. Inside the pptx you can find ppt/revisionInfo.xml and ppt/changesInfos/changesInfo1.xml which serve a somewhat similar purpose as the rev*.xml’s. Within the DS_3 folder, PowerPoint stores various document files including the revisionInfo.xml and changesInfo.xml. These files are stored similarly as for Excel and seem to include either compressed xml, uncompressed xml, or various unknown binary formats.

It is not yet obvious which document a given folder belongs to. We have found that using the creator and created metadata from core.xml seems to be good for helping to identify the relevant document. The output from ODC Recon conveniently displays this metadata in the summary output.

Extraction of Granular Revision Information within Temporary Collaboration Data

Now we will demonstrate the extraction of granular revision information from temporary collaboration data which is not present in a document’s FSD file.

Since we are still working on reversing Word’s temporary collaboration data format, and PowerPoint quickly merges temporary collaboration data into FSD files, we will perform this demonstration with an Excel document shared from OneDrive.

The demonstration will involve this specific situation in which we have found granular revision information missing from an Excel document’s FSD:

  • User A uses the desktop version of Excel to open a document stored on OneDrive

  • User A shares document with User B

  • User B opens document via online version of Excel (per email invitation from User A)

  • User B makes various edits to document

  • User A does not perform any edits, and after some time closes Excel

  • User A’s computer is forensically imaged

This is just one situation in which granular revision information is not merged into an FSD file. We are working on identifying more of these situations.

In this situation, the edits made by user B will not end up in user A’s FSD file, but they can be found within user A’s DS_1 (temporary collaboration data) folder!

We have built a small internal tool which parses the temporary collaboration data (DS_(X) folders) and extracts useful information, in order to identify where the granular revision information for particular documents is found… and to of course extract it. We may fold the functionality from this internal tool into ODC Recon if there is demand for it.

To begin this demonstration, I opened an Excel document both with the desktop (User A) and online (User B) versions of Excel and then simulated collaboration between two users. Using online Excel I added a worksheet (Sheet4) with some data as can be seen in Image 1.

Image 1 – User B

I did not make edits to the document with the desktop version of Excel, but as can be seen in Image 2, the worksheet (Sheet 4) was created in the background while it was open in the desktop version of Excel.

Image 2 – User A

Then I closed both applications with the spreadsheet open. I grabbed the FSD related to this document and ran an internal build of ODC against it. Parsing went fine and ODC Recon exported both entire document versions and “RevisionData” files which contain granular revision information:

test3_(2020-03-27T23-00-03Z)_0_.xlsx
test3_(2020-03-27T23-00-03Z)_1_.xlsx
test3_(2020-03-27T23-01-30Z)_xlsx_update0_RevisionData.zip
test3_(2020-03-27T23-01-30Z)_2_.xlsx
test3_(2020-03-27T23-01-20Z)_xlsx_update1_RevisionData.zip
test3_(2020-03-27T23-17-53Z)_3_.xlsx
test3_(2020-03-27T23-17-53Z)_4_.xlsx
test3_(2020-03-27T23-17-37Z)_xlsx_update2_RevisionData.zip

ODC Recon also reported on the granular revisions found within the “…RevisionData.zip” output files:

 Unique revision edits found: 8
Revision 0: 2020-27-03T23:01:04.000 : author={"DisplayName":"joakim schicht","UserId":"94d4b574aac06f48","ProviderId":"Windows Live"}
Revision 1: 2020-27-03T23:01:14.000 : author={"DisplayName":"joakim schicht","UserId":"94d4b574aac06f48","ProviderId":"Windows Live"}
Revision 2: 2020-27-03T23:01:20.000 : author={"DisplayName":"joakim schicht","UserId":"94d4b574aac06f48","ProviderId":"Windows Live"}
Revision 3: 2020-27-03T23:16:42.000 : author={"DisplayName":"joakim schicht","UserId":"94d4b574aac06f48","ProviderId":"Windows Live"}
Revision 4: 2020-27-03T23:16:56.000 : author={"DisplayName":"joakim schicht","UserId":"94d4b574aac06f48","ProviderId":"Windows Live"}
Revision 5: 2020-27-03T23:17:02.000 : author={"DisplayName":"joakim schicht","UserId":"94d4b574aac06f48","ProviderId":"Windows Live"}
Revision 6: 2020-27-03T23:17:09.000 : author={"DisplayName":"joakim schicht","UserId":"94d4b574aac06f48","ProviderId":"Windows Live"}
Revision 7: 2020-27-03T23:17:37.000 : author={"DisplayName":"joakim schicht","UserId":"94d4b574aac06f48","ProviderId":"Windows Live"}

Document id: joakim - 2015-06-05T18:19:34Z | 13_ncr:1_{4ED293B0-872C-418A-901B-8DE71D84AD26} 

This output from ODC Recon helps us determine that the last granular revision found within the FSD is from 2020-27-03T23:17:37.000, and the most recent version of the spreadsheet exported is test3(2020-03-27T23-17-53Z)4_.xlsx. You will see why the “Document id” information will be useful soon.

So I opened the test3(2020-03-27T23-17-53Z)4_.xlsx and we can see in Image 5 that the newly added worksheet is missing. I also looked into the granular revisions contained within the FSD’s revision data, and did not find any reference to the new worksheet.

Image 5 – User A

Ok, it is now time to look into the ODC’s temporary collaboration data within the DS_1 folder structure.

Within DS_1 I found the folder with the most recent timestamp, which would likely be associated with the spreadsheet in question. But as we will see, there is other output from ODC Recon that we can use to identify a more specific connection between a particular document and temporary collaboration data.

I opened up the folder

C:\Users\joakim\AppData\Local\Microsoft\Office\16.0\OfficeFileCache\DS_1\0\0\DCCDDJCVHSGBCKETCNHZDIDQB5CUFDDE

and verified there are a bunch of files with obfuscated file names. See Image 6.

Image 6 – User A

Next I ran our internal tool for parsing temporary collaboration data (DS_1 content) and got this output:

Unique revision edits found: 13
Revision 0: 2020-27-03T23:01:04.000 : author={"DisplayName":"joakim schicht","UserId":"94d4b574aac06f48","ProviderId":"Windows Live"}
Revision 1: 2020-27-03T23:01:14.000 : author={"DisplayName":"joakim schicht","UserId":"94d4b574aac06f48","ProviderId":"Windows Live"}
Revision 2: 2020-27-03T23:01:20.000 : author={"DisplayName":"joakim schicht","UserId":"94d4b574aac06f48","ProviderId":"Windows Live"}
Revision 3: 2020-27-03T23:16:42.000 : author={"DisplayName":"joakim schicht","UserId":"94d4b574aac06f48","ProviderId":"Windows Live"}
Revision 4: 2020-27-03T23:16:56.000 : author={"DisplayName":"joakim schicht","UserId":"94d4b574aac06f48","ProviderId":"Windows Live"}
Revision 5: 2020-27-03T23:17:02.000 : author={"DisplayName":"joakim schicht","UserId":"94d4b574aac06f48","ProviderId":"Windows Live"}
Revision 6: 2020-27-03T23:17:09.000 : author={"DisplayName":"joakim schicht","UserId":"94d4b574aac06f48","ProviderId":"Windows Live"}
Revision 7: 2020-27-03T23:17:37.000 : author={"DisplayName":"joakim schicht","UserId":"94d4b574aac06f48","ProviderId":"Windows Live"}
Revision 8: 2020-28-03T13:57:31.000 : author={"DisplayName":"Gjestebruker","UserId":"c404e30d92fcbbbd","ProviderId":"Windows Live"}
Revision 9: 2020-28-03T13:57:49.000 : author={"DisplayName":"Gjestebruker","UserId":"c404e30d92fcbbbd","ProviderId":"Windows Live"}
Revision 10: 2020-28-03T13:57:52.000 : author={"DisplayName":"Gjestebruker","UserId":"c404e30d92fcbbbd","ProviderId":"Windows Live"}
Revision 11: 2020-28-03T13:57:54.000 : author={"DisplayName":"Gjestebruker","UserId":"c404e30d92fcbbbd","ProviderId":"Windows Live"}
Revision 12: 2020-28-03T13:58:00.000 : author={"DisplayName":"Gjestebruker","UserId":"c404e30d92fcbbbd","ProviderId":"Windows Live"}

Document id: joakim - 2015-06-05T18:19:34Z | 13_ncr:1_{4ED293B0-872C-418A-901B-8DE71D84AD26}


Now, it is good to verify that we are actually in the correct DS_1 folder relating to our document in question. Compare the “Document id” values above.

Then we see that granular revisions 8 through 12 are missing from the FSD. We can see the changes are performed by user “Gjestebruker” (Norwegian which means guest user in English), which seems to be the way that such metadata is stored when a document is edited by a “remote” user – for example, from the perspective of User A’s ODC, the remote user would be User B. An interesting thing to note here – if User B had not been logged into OneDrive when the same edit was made, the DisplayName would have been recorded as “Guest User” rather than “Gjestebruker” and the UserId value would be missing. In other words, if User B was logged into his OneDrive, the DisplayName recorded in User A’s ODC would be localized in terms of language and the UserId value would be present. If User B was not logged into his OneDrive while performing the same edit, the DisplayName recorded in User A’s ODC would be in English and the UserId value would be missing.

As explained earlier, we need to look into the rev*.xml files to figure out exactly what revisions 8 through 12 were. At this point we don’t really know which revStream.xml to look at, so we need to open all the 7 files and see which of them contain revisions 8 through 12. We will use the timestamps (2020-28-03T13:57:31.000 – 2020-28-03T13:58:00.000) to identify them. We can disregard revIndex.xml and revIndex.xml.rels because those references are for filenames unknown to us. After a quick investigation of the 7 files, we have found the relevant revision entries. It seems to be the case that revisions which have already merged back into the FSD are stored in a group rather than individually. The unmerged revisions are stored individually – one per file. We thus have revisions 8 through 12 in 5 distinct files.

Here are the granular revisions that we have been so interested in, sorted in correct order based on time (please focus on what I have placed in bold):

<xrr rev="10" uid="{A1FF25BA-5CC5-45D6-9809-4CF545BFE1C2}" sh="{F159A474-B67C-4119-BB22-426224213613}" genVer="45" genBld="4503600461351149" author="{"DisplayName":"Gjestebruker","UserId":"c404e30d92fcbbbd","ProviderId":"Windows Live"}" time="2020-03-28T13:57:31">

<sht op="insert" name="Sheet1" idNew="3"/>

</xrr>

<xrr rev="11" uid="{3D449444-F482-4837-A0FA-C7C95BE723A5}" sh="{F159A474-B67C-4119-BB22-426224213613}" genVer="45" genBld="4503600461351149" author="{"DisplayName":"Gjestebruker","UserId":"c404e30d92fcbbbd","ProviderId":"Windows Live"}" time="2020-03-28T13:57:49">

      <sht op="rename" name="Sheet4"/>

</xrr>

<xrr rev="12" uid="{E5E6A63C-3844-43A2-93C6-B7D16E5AAB73}" sh="{F159A474-B67C-4119-BB22-426224213613}" genVer="45" genBld="4503600461351149" author="{"DisplayName":"Gjestebruker","UserId":"c404e30d92fcbbbd","ProviderId":"Windows Live"}" time="2020-03-28T13:57:52">

<c r="B2">

      <c>

            <v>1</v>

      </c>

</c>

 </xrr>

 <xrr rev="13" uid="{2803D0FC-FAD1-4272-B3E8-3DD3A2E235C8}" sh="{F159A474-B67C-4119-BB22-426224213613}" genVer="45" genBld="4503600461351149" author="{"DisplayName":"Gjestebruker","UserId":"c404e30d92fcbbbd","ProviderId":"Windows Live"}" time="2020-03-28T13:57:54">

<c r="B3">

       <c>

            <v>2</v>

       </c>

</c>

 </xrr>

 <xrr rev="14" uid="{2ED94318-D312-41C8-8809-98364B7F1FC9}" sh="{F159A474-B67C-4119-BB22-426224213613}" ctx="copy" genVer="45" genBld="4503600461351149" author="{"DisplayName":"Gjestebruker","UserId":"c404e30d92fcbbbd","ProviderId":"Windows Live"}" time="2020-03-28T13:58:00">

<c r="B4:B42">

       <c>

             <v>3</v>

       </c><c>

             <v>4</v>

       </c><c>

             <v>5</v>

       </c><c>

             <v>6</v>

       </c><c>

             <v>7</v>

       </c><c>

             <v>8</v>

       </c><c>

             <v>9</v>

       </c><c>

             <v>10</v>

       </c><c>

             <v>11</v>

       </c><c>

             <v>12</v>

       </c><c>

             <v>13</v>

       </c><c>

             <v>14</v>

       </c><c>

             <v>15</v>

       </c><c>

             <v>16</v>

       </c><c>

             <v>17</v>

       </c><c>

             <v>18</v>

       </c><c>

             <v>19</v>

       </c><c>

             <v>20</v>

       </c><c>

             <v>21</v>

       </c><c>

             <v>22</v>

       </c><c>

             <v>23</v>

       </c><c>

             <v>24</v>

       </c><c>

             <v>25</v>

       </c><c>

             <v>26</v>

       </c><c>

             <v>27</v>

       </c><c>

             <v>28</v>

       </c><c>

             <v>29</v>

       </c><c>

             <v>30</v>

       </c><c>

             <v>31</v>

       </c><c>

             <v>32</v>

       </c><c>

             <v>33</v>

       </c><c>

             <v>34</v>

       </c><c>

             <v>35</v>

       </c><c>

             <v>36</v>

       </c><c>

             <v>37</v>

       </c><c>

             <v>38</v>

       </c><c>

             <v>39</v>

       </c><c>

             <v>40</v>

       </c><c>

             <v>41</v>

       </c>

 </c>

 </xrr>

These revisions match the metadata we reviewed earlier, but now we know exactly what the revisions were:

2020-03-28T13:57:31 Inserted a new worksheet, Sheet1.
2020-03-28T13:57:49 Renamed the worksheet to Sheet4.
2020-03-28T13:57:52 Put the value “1” into cell B2.
2020-03-28T13:57:54 Put the value “2” into cell B3.
2020-03-28T13:58:00 Generated more values in cells downwards to B42.

You have now seen in this demonstration how we managed to find granular revisions to an Excel document within the ODC’s temporary collaboration data on User A’s workstation, which were not available in an active document (because the Excel document existed on OneDrive, not User A’s workstation) or the FSD file (associated with the Excel document on OneDrive) on User A’s workstation. We can also confirm that the recovered granular revision information is consistent with what we saw earlier.

It is worth noting that the next time the Excel document is opened by User A with the desktop version of Excel, this granular revision information will be merged into the FSD file on User A’s workstation.

Conclusion

I hope you have enjoyed this inside view of Office Document Cache exploitation. As you may already suspect, we just scratching the surface here in terms of ODC… there is much more to come!

Published in: ODC Recon
Share:

Join the List

Arm yourself with updates about Arsenal tools, training, and research. Our mailing list is double opt-in so you will need to check your email and confirm your subscription before receiving our mailings.