Data is Infrastructure
Nations should be rushing to build it
The most absurd memories are often the clearest: I’m sitting in my college dorm room in Abu Dhabi, desert sun streaming through the window, aircon quietly humming in the background. On my laptop is a CBC article with the headline “Is data the next oil”?
I remember this moment because it is the first time this idea, which I will see in new forms many times since, appears to me.
I remember this moment because of the absurdity of reading this headline in the world’s richest petrostate: a place whose development will never be mirrored through data wealth. Despite the data giants of today resembling the oil titans of yesterday - powering the world’s most valuable corporations as well as every crucial good and service we use - data lacks scarcity or intrinsic value. Unlike oil it can be copied at will. In other words, data will never be a commodity on par with oil.
Because data is not a commodity.
Data is infrastructure.
An iron spine
Railways also have no inherent value.
A million miles of track with no useful destination is just expensive metal. Once leveraged in the right places however, railways won wars, powered economic revolutions, and created nations. No one today doubts their historic importance.
Similar claims of importance are made about AI. I’ve been working in the space long enough to hear every government from Vienna to Vietnam announcing the foundations of an “AI ecosystem”, with themselves at the centre of the next industrial revolution.
But there’s a problem: they’re building the wrong things. Most governments focus on vague concepts like “forums”, “ecosystems”, and “designated zones” instead of concrete AI precursors.
As it happens, the precursors for intelligence are a deceptively simple 3 ingredients: algorithms, compute and data1. If AI algorithms are increasingly being open sourced, and compute is owned by a select few corporations, that leaves data as the final arena to fight over.
When this fact is accepted, the implications for geopolitics become grave. We may indeed be headed for a world where strategic databases play a nation-defining role that railways once did. Just as Canada or Russia’s cross-continental railways opened up their countries for millions of entrepreneurs, immigrants and investment, so too could the right datasets give a distinct AI advantage, attracting talent and economic prosperity. The nations that build out public data infrastructure will be able to manufacture better products, raise living standards, and strengthen militaries. Those that fail to grasp the opportunity will be left behind.
Go big and stay home
In the race to be the next AI superpower, governments tend to splash money haphazardly: physical buildings, admin-heavy programs, and direct subsidies. This spending often follows the model of universities, institutions that have become increasingly outdated with declining enrolment, skyrocketing fees, and cumbersome bureaucracies. The best entrepreneurs by contrast are intrinsically driven to build ambitious things: they don’t need a glossy building, official designation, or a step-by-step program. While they’ll take direct subsidies, the results are a bit like bribing someone to be your friend: fleeting and expensive. Give an entrepreneur an advantage that they won’t find anywhere else however, and they won’t leave. Ever.
Living cost, the weather, agreeable politics, fun events, easy publicity, and nice restaurants are all relative non-factors. The presence of unique, highly valuable data by contrast is one of the few differentiating factors a country can provide.
The flywheel effects of homegrown innovation are massive. Databases attract founders, which create new companies, which in turn provide capital and talent fertilization for the next generation of founders2. Every government promotes these effects, but few achieve them.
Meanwhile the timing has never been better. AI behemoths have already gobbled up most of the free data available. Progress from this point on will be about obtaining and cleaning new sources of data that are private, obscure, or messy. Governments could have a unique role in carrying out this difficult but diffusely beneficial work.
“Open data sets are public goods: they benefit many researchers, but researchers have little incentive to create them themselves” - Eric Schmidt
Nationally available databases are also equitable. Much like highways allow all drivers, regardless of income level, to travel on quality infrastructure together, databases can benefit everyone from the biggest corporations to the teenager in a basement, from researchers to start-up founders. The alternative is less appetizing: tech giants use their financial heft to lock down the most important sources of data to their benefit. A world of private roads and tolls. A world of a few winners, clustered in a single city.
Specificity is all you need
So what would this infrastructure look like? To start with, an enormous lake of data is not enough. If governments want to create something that is world-class, they need to be focused on what they collect and how it is structured. Trying to build databases to power general models like Llama or GPT is a waste of time - instead governments can try to serve particular industries that they already have (or are trying to build) an advantage in. Some examples:
Material science. AI for material discovery has been building momentum, and looks likely to play a major role in the manufacturing and energy industries. A lack of data however continues to hold back progress, especially as it often requires expensive compute or physical experiments to create “ground truth”. Some open databases have emerged, for example the Open Catalyst Challenge backed by Meta. The vast majority of use cases however have paltry material. Governments could leverage a unique materials database to create competitive advantage in aerospace (alloys), semiconductors (precursor chemicals), carbon capture (sorbents), or other industries.
Health. AI for drug discovery is a strong example here, but there are numerous other possibilities from preventative care to bioengineering. Governments in countries with public healthcare have been particularly weak at making data private, secure, and accessible for research and innovation, so there is a big opportunity just in making basic improvements. Given the growth in the health industry, this is a fertile space to build competitive advantage.
Urban services. Data from municipal services can be combined to help local construction, mobility, or tourism companies. Singapore has a been an early leader in this space with their “smart nation” initiative, providing a platform where numerous government departments can share geospatial data.
Who shall pass?
Data access is another piece of the puzzle. When a government builds a new highway or hospital, rival nations can’t make a carbon copy out of thin air. Open databases may well be part of the solution3, but my intuition is that they would not deliver the economic returns to justify public entities footing the bill. There will thus need to be an ongoing conversation about access rights. One way the databases could be publicly available while encouraging infrastructure-style benefits would be to restrict the training of models on the dataset to organizations that are based in said region. Users of the database could furthermore be required to contribute data back as a condition of use, thus creating self-sustaining growth.
Decisions around how to build such a database however shouldn’t be handled by an existing government bureaucracy - the temptation to give out access for political gain is too strong. Public organizations are also ill equipped to deal with the technical and organizational needs of high-tech entities. Any new public entity will therefore need to be built from the ground up, equally able to ingest data from ponderous, bureaucratic organizations, as to disperse it to nimble, inexperienced startups. An arm's length agency with the freedom to make clear-sighted decisions in the public interest is probably the best option (pension funds may be an instructive example). This agency would need to identify the data that would drive the greatest impact, as well as supervise its collection. Some of this data will need to be coaxed from private industry players, other data will need to be purchased or created. In other words this will be a gargantuan task, requiring sustained commitment from the public.
Like building a highway system or an international airport, a truly transformative national database won’t be easy. But those nations which are able to persevere will reap the greatest benefits. In the end, national databases could become a new kind of public good, a source of pride as well as economic advantage. My personal hope is that once the initial economic impact has been felt, national databases would become fully open, thus extending the benefits to under-resourced countries as well.
Striking it rich (slowly)
There will be no oil equivalent in the coming century, and that is probably a good thing. Oil propped up autocracies, created wars, and rewarded luck of geography as opposed to human ingenuity. The coming century will be won by those who build with ambition and conduct themselves with rigour. Research universities, complex compute and, yes, natural resources, are part of that future.
But those who recognize databases as a national project, and carry out the difficult work to make it available, will bring better deserved riches than oil ever provided.
1. Yes, this is a simplification, but one which can help us focus on what actually matters.
2. Fairchild Semiconductors being the canonical example in the Bay Area
3. I believe we should lean towards open-sourcing wherever possible, for the record. But there are always times when it is not the solution.


