AI’s scale

OpenAI’s faith in “scaling laws’”of a general kind remains strong. But there is more to it.

10 February 2026 – As Altman put it in a blog post in February last year:

The intelligence of an model roughly equals the log of the resources used to train and run itIt appears that you can spend arbitrary amounts of money and get continuous and predictable gains; the scaling laws that predict this are accurate over many orders of magnitude.

The “laws” may of course break down – they are empirical generalizations, not laws of physics – but they are worth taking seriously because “arbitrary amounts of money” are indeed being shelled out on the infrastructure of it. In August, researchers for Morgan Stanley estimated that $2.9 trillion will be spent globally on data centres between 2025 and 2028, while Citigroup has estimated total AI investment globally of $7.8 trillion between 2025 and 2030. (For comparison, the US defense budget is currently around $1 trillion per year.)

One little word, though, in Altman’s blog post should give us pause: “log“. A logarithmic function, at least of the kind that is relevant here, is characterized by diminishing returns. The more resources you put in, the better the results, but the rate of improvement steadily diminishes. One reason to keep going regardless may well be the widespread sense that AI is a “winner takes all” business, and so having the best models (by even a small margin) will bring disproportionate rewards, perhaps even de facto monopoly.

Another motivation seems to be the hope that small improvements in performance will suddenly give rise to a dramatic qualitative change: the emergence of “artificial general intelligence” or maybe even “superintelligence”.

However, for most in the AI ecosystem, the availability of human-created data on which to train AI systems is already a big constraint. Compute is growing. But the data is not growing, because we have but one internet. Data is the fossil fuel of AI. It was created somehow, and now we use it, and we’ve achieved peak data, and there’ll be no more.

These are not nuanced claims, but the specialists I have spoken to don’t dismiss it out of hand. It’s true, one of them told me, that soon “we’ll be data constrained” but it also raises the prospect that AI systems may themselves be able to create reliable new data to replenish humanity’s over-exploited reservoirs. What we don’t know is, do we cross the threshold where AI produces novel training data effectively? If we get to that threshold, then we are going to have superintelligence. Or at least be very close.

One major researcher, Lonneke van der Plas, who is a specialist in natural language processing, implicitly warns of the risk of training AI models on computer-generated data. Among the languages on which she works is Maltese, which has only around half a million native speakers. Much digitally available Maltese, she says, is low-quality machine translation. In consequence, if you go all out for scale in developing a model of Maltese “you get a much worse system than if you carefully select the data” and exclude the reams of poor-quality text.

AI’s scale doesn’t matter just to specialists. The rest of us are being taken on a ride along the logarithmic curve too. The graphics chips and data centres on which Altman’s “arbitrary amounts” are being spent require huge quantities of electricity to power them. Some of this is coming from renewable sources, but much of it involves burning natural gas or sometimes even coal. Just one of the many new gas-fired power plants that are being constructed in the U.S. to meet the growing demands of data centers is on the site of an old coal-fired power station near Homer City, Pennsylvania. When it is up and running it will generate 4.4 gigawatts, just a little more than the peak winter electricity demand for the whole of Scotland.

The International Energy Agency reckons that if the current global expansion of data centers continues, the CO2 emissions for which they are responsible, currently around 200 million tonnes per year, will be about 60 per cent higher by 2030. In a rational world, new AI data centres would be built only where ample renewable electricity is available to power them.

But in a reckless race along the diminishing-returns curve, whatever fuel is immediately available will tend to get used. In the U.S., that still mostly means natural gas; in China, it’s coal. Investors, many of whom are uneasy about the trillions of dollars being spent on AI infrastructure, oscillate between the fear of missing out on continuing gains on AI stocks and the fear that AI is a bubble. No one can say with certainty if, or when, the bubble will burst. If it does, it will be a financial market trauma.

But from the viewpoint of the planet it might be better sooner rather than later.

GregoryBufithis

MenuMenu

Leave a Reply Cancel reply

MenuMenu

You may also like...

Une unité de cyber-attaque russe “déguisée” en pirates Iraniens

Google fires 28 staffers involved in office protests over Israel contract

VIDEO: LIVE FROM THE MOBILE WORLD CONGRESS – a look at indoor location positioning (indoor GPS if you will)

Leave a Reply Cancel reply