The e-discovery gnomes jump into the Mueller Report and give a masterclass in investigating documents … and then the Report gets “updated”

 

The official PDF of the Mueller report was updated yesterday in a subtle but important way

 

23 April 2019 (Brussels, Belgium) – Yes, I spent some time last week and read it. And since everybody talks about “bias” these days, let me get my bias out of the way before I continue.

To me the Mueller report shows that bad guys who play dirty, like Trump, always win. For the Trump presidency, exposed in all its ugliness in the Mueller report, is predicated on a willingness to shred the rules and norms that sustain liberal democracy. And it relies for its success on the unwillingness of liberal democracy’s guardians to do the same. Donald Trump runs his White House like a Mafia boss. Mueller’s report is littered with examples that read more like the behavior of a Mafioso than a commander-in-chief.

There is a fundamental mismatch here: Trump cutting every corner, trampling on every ethical guideline, while Mueller and those like him primly weigh up the legal niceties and nuances. They are thumbing through the rulebook of the monastery while in front of them a mafia don creates havoc. This is the authoritarian populists’ great strength, and not only in the US: they break all the rules, banking on the fact that their opponents will stick to them and be weaker as a result. As I have written time and time again, it is the perennial villain’s advantage: they play dirty, knowing you’ll play nice. Mueller was deeply conservative in his legal approach. Trump is lucky he didn’t face his own Ken Starr.

And yet Mueller has not failed. He has handed Congress a revolver along with a full clip of ammunition. But he has given the Democratic-controlled House a politically unappealing dilemma: consume their agenda with impeachment, or continue talking about jobs or healthcare or the future.

The e-discovery gnomes jump in

The first version of the Mueller Report was completely unsearchable and there was a firestorm of protest across social media which most likely led the DOJ to issue an update which was searchable (see link to updated version below).

Those mavens at Logikcull were actually the first with a searchable version so you could find keywords and key players, make annotations, tag the most revelatory information, etc. For the Logikcull story on the Report click here.

And then Martin Nikel, “Legal Technology Sensei Master” at Deloitte Switzerland, took the Mueller Report and showed you just what kind of analysis you could do by using some common techniques available to all for analysing the content and structure of a document. His big take-away:

The main message though – if you are producing a sensitive document or communication, remember that there are people waiting to dismantle and analyse it in ever smarter and quicker ways.

By breaking apart into images, pages, paragraphs, sentences, entities and sentiments etc, I can instantly identify anomalies and prioritise where I need to focus without any time wasted in attempting to read or understand the content.

You can read Martin’s full analysis by clicking here.

The official PDF of the Mueller report has been updated in a subtle but important way

 

Yesterday, the US Justice department very quietly uploaded a new version of the Mueller report. We were notified by CounselBot which is a bot that tweets changes to the DOJ’s Special Counsel Office web page.

It is one of hundreds of Twitter bots we follow to stay alert to changes across the many ecosystems we follow: legal, technology, e-discovery, cyber, etc. 

You wouldn’t notice any difference just by looking at the report. What’s new is a layer of data that makes it possible, finally, to access the underlying text of the document.

When the Mueller report came out on April 18, it was essentially a giant file of images (at 140 MB, the file was more than 300 times larger than an ebook of Crime and Punishment.) The Justice department appears to have scanned a paper copy of the report – using a Ricoh MP C6502 Color Laser Multifunction Printer, something that Martin Nikel discussed in his blog post, referenced above. That’s why the text is blurry, you can see the edges of some pages, and there’s a fuzzy yellow line through the middle of the entire report.

The very Ricoh MP C6502 that scanned the Mueller report

The decision immediately elicited groans from people trying to search the report for juicy details. A giant file of images has no text to search. It was also condemned by the group involved in setting technical specifications for the portable document format: “This deliberate and unnecessary act made the document substantially harder for anyone and everyone to use, forever,” wrote Duff Johnson, executive director of the PDF Association, in a delightful review of the file’s nerdiest details. News organizations and Mueller fanatics (and Logikcull, as noted above) quickly addressed this problem by running the PDF through a process known as optical character recognition (OCR) to add searchable text to the document.

So, to recap: the Mueller report was written on a computer, then printed out on paper, scanned back into digital images, and finally regenerated into text using software.

As you might imagine, this flip-flop is going to generate some flaws. Redactions in the report confused most OCR software, rendering some of the text illegible, as Johnson pointed out in a follow-up post.

However, the Justice department’s image-only PDF also seemed to violate the US government’s own guidelines for making documents accessible to all readers. If PDFs don’t come with a layer of text and other metadata, “persons with disabilities who utilize assistive technology such as screen readers or speech-to-text tools may find it difficult or impossible to access essential or critical information,” explains the federal agency in charge of such things.

The website for Robert Mueller’s special counsel investigation acknowledged this shortcoming – “The Department recognizes that these documents may not yet be in an accessible format”- and offered to send a text file of the report to people who would have trouble reading it. It’s not clear if anyone requested such a file or received one. In any event, that offer was removed from the special counsel’s website when the new version of the PDF was uploaded. But as several techies have pointed out, there are still a lot of errors in the tags. The software the Justice department used to OCR the report, Adobe Acrobat, generated some jumbled or incomplete text in the new PDF, especially around large redactions and photos. And many invisible markers intended to make the document more accessible were applied incorrectly. The file is also still a 140MB set of images, albeit now with text underlying it. A never-scanned, native-text PDF of the report would likely be less than 5MB.

But in an interview yesterday, Johnson pointed out one of the most important PDFs in American history still contained a few mysteries. It remains a scanned file. Why is it scanned at all? One possibility is that when it was received from Mueller, it was on paper. That’s weird, if true. Why wouldn’t Mueller send over a digital file? Why would DoJ not say, “Excuse me, could you send over a PDF?”

We’ll just need to wait for the movie.

Leave a Reply

Your email address will not be published. Required fields are marked *

scroll to top