Life in a networked data panopticon. New analytical tools pop up every day.

The new public commons is shaping up not as web pages (Wikipedia aside) but as web data. And the analytics out there are astounding.

 

19 July 2022 – Part of this will be a short trip down memory lane for many of my readers of a “certain age”, and I am in there with you.

It used to be that if you were a technophile you were also very possibly, to some degree, a conspiracist. The two were adjacent; some of the foundational texts of nerddom are basically conspiracies repackaged as fiction or parody – The Illuminatus! Trilogy, or The Book of the SubGenius might come to mind.

When the internet showed up, it became the place to go for the good stuff. It could teach you about the cabals that run the world, or about how ghosts are just time travelers. Many of us on this listserv were early adopters and we took to it like a duck to (tainted) water, armed with the names of FTP sites written in a notebook. It was just the stuff for a powerless person (although mostly powerless adolescents) looking for order. The alternative – that I was a normal person instead of a suppressed genius – was unthinkable.

We had all been members of the animated world of bulletin-board systems which were operated by hobbyists and entrepreneurs to host conversations and the exchange of files between personal computer users in the same geographic area. These systems came into being in 1978 but disappeared quickly after the advent of the internet. Anybody remember FidoNet? [sigh]

As the internet grew, becoming less about conspiracy and oddity and more about commerce, we did too. We put childish things away, began to build a daily paper (well, homepage), and in general came to believe that the world was run not by forces intent on evil chaos but instead by a network of goofballs acting out of a variety of motives, mostly commercial, mostly greedy. Malevolent? Sometimes, a little. Satanic? Nah.

Still, a drop of conspiracist ink tinted all of our perceptions, at least mine. I assumed that the people who seemed to run the world – in my age group it started with guys like Steward Brand and Douglas Engelbart, and then later the guys that did run the world like Bill Gates, and then Jeff Bezos and Elon Musk and The Zuck – just had more knowledge of its secrets. “They have access to more information”, I’d think. They know what the companies they run and invest in are working on, they can see reports, they can buy mounds of raw data and hire teams of consultants to synthesize it into recommendations.

But it turns out that the pioneers they were guided by were the same ones I was reading: James Beniger, Thomas Kuhn, Marshall McLuhan, Nicholas Negroponte, Neil Postman and Alvin Toffler. And their hobbies were normal rich-people hobbies. And they did (somewhat) normal things. Does anybody remember the launch video for Windows 95? It was Jay Leno driving a car that looked like a computer mouse. Yeah, powerful people had/have a lot of data – but it is hard to guess what hidden knowledge they might possess.

And now, in fact, it feels to me as if a large portion of humanity has entered the age of no secrets. Regular people now do “open source intelligence,” trawling YouTube footage of Ukraine war zones, triangulating that with Google Maps, comparing notes on Reddit – all to define exactly what happened. I have a whole OSINT (Open-Source Intelligence) team doing this on a daily basis.

And if you’re meeting someone for coffee you certainly search for their name – you’ll slip right into their LinkedIn profile or their property records, and you’ll have to remember not to bring up the price of their house when you sit down with them. Back in the day I used to download big Freedom of Information Act PDFs and poke around inside leaked databases for interesting nuggets. Remember using Camelot and Carousel before Adobe re-branded? You needed to do this stuff (and you still do, really) to write an authoritative post. But who can keep up with the pace of these releases now? Fortunately I have a media team that does this for me. My God, there are whole hard drives’ worth of data – so much data that we brand it: the Paradise Papers (1.4 terabytes), the Panama Papers (2.6 TB), the Pandora Papers (2.9 TB). And recently – did anyone notice aside from Wikipedia? – the massive Swiss bank data spill affecting tens of thousands of banking clients across the globe. And all that information is out there on the Dark Web, for sale.

These databases are massive. Analysing such a volume of data isn’t a job for Excel or any existing eDiscovery or database management programs. It requires sophisticated machine learning toolkits and it has spawned the likes of Python, Neo4j, Linkurious, Fonduer and Scikit-learn. It is why the fastest-growing category in the enterprise software market is database analysis. Graph databases (such as Neo4j) are my favorites, excelling at spotting data relationships at scale. Instead of breaking up data artificially, graph databases more closely mimic the way humans think about information. Once that data model is coded in a scalable architecture, a graph database is matchless at mining connections in huge and complex datasets.

I’ll have more below about even newer toolkits but for my detailed history on how all of this database analytics rolled out and how it works click here.

As I noted several years ago when I was writing about the massive SolarWinds cyber attack pulled off by the Russians, at some level you can look at the entirety of modern telecommunications as a system for creating – and then losing control of – secrets. DMs, group chats, video footage of our collective noses being picked on the elevator. In the future, more will get hacked, more will converge, more systems will arise to find patterns in other systems, to recognize the still images, to interpret the video, to interpret both the static and the dynamic. AI is pretty powerful this way: It can’t think, but it can tattle. Europe seems ready to try and regulate it all (in a totally inept manner, as I have written), while the U.S., when it comes to privacy, is trapped somewhere between fundraising and grandstanding. Having given up on substance, U.S. regulators simply cannot abandon theatre. Meanwhile, China just runs a wire directly from your computer to the government. So much more efficient.

But does life in a networked data panopticon have to be grim, so dystopian, all the time? No. I subscribe to a wonderful mailing list called Data Is Plural, which regularly sends out new sources of health outcomes, voting records, bird sightings, and so forth. It’s the 5th newsletter I open almost everyday (Data Is Plural is not issued every day) after I scan my 4 news compendiums, those 4 compendiums being part of a proprietary context API news crawl my CTO created based on my interests.

To release a data set is just so simple these days – and even an optimistic act. Have you seen Microsoft’s Planetary? They have the whole world in there, more maps than you thought possible. Tree cover. Soil type. Or try Wikidata.org and ask for a list of all the famous dogs, or cities with populations above a million. There’s even a new data format, Zarr, that can take any file you put on the cloud and make it a geographic database, or an array, for ease in Python implementation analysis. Or try Datasette that turns your database into a website, lickety-split. Or try Synthesise – which also uses Python – or Elastic and with these latter two kits you can do the most amazing search and analytics on almost any static or dynamic data base. Needless to say, many of these options are making the rounds of our more astute enterprise search and litigation tech clients because “the usual suspects” cannot do it any of what they can do. Most legacy search still grinds under the old school dinosaurs of dtSearch and Lucene or similar 17th century tech. But to be fair, they are not required to do sophisticated search so those tools are sufficient.

Fun note: ransomware gangs are using some of this new tech to create searchable databases. The hackers advertise that their stolen data has been fully indexed and that the search feature included support for finding information by filename or by content available in documents and images. We’ve tried it. It works. So the hackers’ search service makes it easier for cybercriminals to find passwords, addresses, social security numbers, credit card info or other confidential information and cross-check with other data bases – with relative ease. Well, Big Tech always harps on creating “frictionless tech” so they should be proud. Get an air-gapped computer and a Tor browser and you can (safely) try it yourself. 

The traditional line between client and server is blurring. Ok, it’s abstract stuff, but the upshot is that it’s getting easier and easier to put data out there, to give people something to grow on. The new public commons is shaping up not as web pages (Wikipedia aside) but as web data.

When you sit down to process the world, you face a choice. You can become absorbed in the powers-that-be, the systems-that-be and decide to interpret through the wild interconnected networks of “the usual suspects”. I’ve done that. And you can also still use a lot of “usual suspect” static software and look at the big bold names who run your industry, who run the government, and see how many of them serve on each other’s boards. Or do simple text retrieval. I’ve done that too. Still do. It informs my work.

But you can also look at just how much of the entire world, the real world, the true world is now available to anyone with a reasonable network connection and a desire to really understand – using a plethora of new and growing technology and process. If you have kids to raise (or even grandkids and great-grandkids) you owe it to yourself to show them the “usual suspects” really suck for knowing how the world works. Instead of worrying about other people’s power, or staying in the “this-is-the-way-we-have-always-done-things” zombie trance, think of what you’ll download, what you can download. And hopefully you can do something better with it.

Regular readers know my mantra. Information services now sit within complex media ecologies, and networked platforms and infrastructures create complex interdependencies and path dependencies. The power dynamic has changed. Because data has become the crucial part of our infrastructure, enabling all commercial and social interactions. Rather than just tell us about the world, data acts in the world. Because data is both representation and infrastructure, sign and system. It cannot be restricted, it cannot be regulated. As the brilliant media theorist Wendy Chun puts it, data “puts in place the world it discovers”.

We live in a massively intermediated, platform-based data environment, with endless network effects, commercial layers, inference data points, and new paths to analysis. So take advantage of what that offers you and get the hell out of “The Matrix”. Go out there and get some real knowledge.

And read read read read. I realize that many of my readers are “commerce monkeys, commerce machines” (not my turn of phrase – provided by a long time reader) – with barely enough time to read and write and produce for your jobs. You barely have time to scan and parse social media to keep up-to-date. But I just want to throw this out to you should you have the time, especially on your summer break if you get one, take one.

To be an informed citizen is a daunting task. To try and understand the digital technologies associated with Silicon Valley — social media platforms, big data, mobile technology and artificial intelligence that are increasingly dominating economic, political and social life – has been an even more daunting task that brought me to interview scores of advertising mavens, data scientists, data engineers, psychologists, etc. Plus reading reams of white papers and books tracking the evolving thinking and development of this technology. I needed to dust off some classic tomes that have been sitting in my library for years. Books by James Beniger, Jacques Ellul, Marshall McLuhan, Neil Postman, Alvin Toffler, etc., etc. All of them so prescient in where technology would lead us, their predictions spot on to where we are today.

Because when you read Beniger, Ellul, McLuhan, Postman and Toffler this big “Information Society” we think is so new is not so much the result of any recent social change (information has always been key to every society) but due to the increases begun more than a century ago in the speed of material processing. Microprocessor and computer technologies, contrary to currently fashionable opinion, are not new forces only recently unleashed upon an unprepared society, but merely the latest installments in continuing development. It is the material effect of computational power that has put us in a tizzy.

Of course, you will realize straight away that if you are not careful you can find yourself going through a mental miasma with all the overwhelming tech out there. In the past, news that reached you from afar was old news. Now, with instantaneous transmission, all news is contemporary. You live in the present, surrounded by present time, and will be launched into the present. Whereas not so long ago, the present was an island surrounded by the pasts that deepened with distance. Or to put it another way, before the advent of electronic communication, the regulation of information was partly a function of our being bodies in place. Immediacy was structured by place rather than time.

And so the knowledge presumed of the informed citizen expands in scope and detail, and it is often wholly divorced from their everyday experience. But for all of us, the Russian invasion of Ukraine (the first major physical war between developed countries in the connected age) means we have front row seats to watch the post-Cold War era get dismantled. It will be a staggering milestone with deep, deep ramifications. As the tragedy continues to unfold, we will see it manifest in the digital systems that surround us.

Leave a Reply

Your email address will not be published. Required fields are marked *

scroll to top