The data lifecycle as a value add

Written by James Flint | Jul 14, 2025 2:11:18 PM

Tesla’s former director of AI, Andrej Karpathy, gave a talk in June at Y Combinator’s AI Start-up School event that’s been getting a lot of views. In it he says that, over the last 70 years or so, software has transitioned through three distinct phases, two of them very recent: Software 1.0, where developers write explicit code; Software 2.0, where neural networks are trained on data; and Software 3.0, characterised by large language models (LLMs), in which computers can be programmed using natural language prompts.

This is an oversimplification, of course, and misses out punch card programming and all that came before it, but it captures something essential: the shift from compiled code lexicon programming to programming in English (and other languages) that has happened in just the last couple of years, which is making programming accessible to a far broader audience than ever before, creating vast new opportunities for business and application development.

Karpathy himself is an enthusiastic proponent of Software 3.0; he himself coined the term “vibe coding”, which has been adopted by the industry as a label for getting an LLM to write code for you by telling it what you want to build, instead of coding it manually yourself. But he also talks about a second order effect of this evolution of software, and that is of the LLM itself becoming a kind of operating system.

If an LLM can create instruction sets for people on the basis of natural language prompts, it can do the same for other computers or LLMs… which means that computers can now ask other computers to do things for them in much more elaborate ways than was previously possible with classic APIs. This fact is at the heart of the current explosion of interest around so-called agentic systems, which is shorthand for giving LLMs control over other pieces of software – which can include entire computers and robotic systems – so that they can plan and execute complex tasks with real world impacts with varying degrees of autonomy.

This has three big implications for business. The first, and currently most hotly discussed, as it appears to the one with the most immediate impact on human beings, is on jobs and productivity. Exactly what AI is going to do to productivity, and which jobs it’s most likely to affect (or replace) is a matter of much debate, and not one that we’re going to dive into here, other than to remark that it’s clear some changes are going to happen.

The second is on AI governance. If you’re going to ask an LLM to write code for you, or conduct research on the internet, or buy things for you, or write and send some emails, or (more controversially) drive your car or diagnose your illness or hire your staff or act as your therapist, then you’d really better make sure you have some robust processes in place to monitor what it’s doing and provide some quality control.

And the third impact is on data. For decades, businesses have relied upon data infrastructures like business intelligence tools and charting packages built on top of relational databases filled with such information as operational metrics, sales figures and customer behaviour, to provide providing observational snapshots, trend analyses and quarterly reports.

AI is in the process of changing this approach by offering much deeper reasoning and more dynamic inference that in turn enables more informed – and potentially autonomous – real-time decision-making; by providing, in other words, better answers not only to the question of what just happened in an organisation, but also why it happened, what’s likely to happen next, and what actions should be taken. As well as then potentially taking or at least initiating some of those actions.

This more dynamic reasoning requires more than access to data; it demands access to meaning and context. To capture this kind of data and make it available to AIs so that they can make better, more appropriate decisions when acting autonomously, a kind of data structure called the knowledge graph is becoming increasingly ubiquitous.

Unlike traditional relational databases that store data in tables, knowledge graphs organise information as a collection of nodes or vertices, each one an entity – for example a concept, product, event, person – along with a series of connectors, or edges, describing the relationship between them. In a movie graph, for example, you might have a node for Tom Hanks and a node for Forrest Gump, and a one-way edge of the kind “acted in” run from the former to the latter.

It sounds similar and in essence it is, but it is also extremely effective: knowledge graphs capture semantic and contextual understanding and metadata in their actual structure and readily expose it to AI models in a format that aids inference and reasoning. And you already use one every time you run a Google search: the vast knowledge graph that Google has spent years building is what lies behind the search engine’s instantaneous effectiveness at presenting highly contextualised information – and it is also what enables Gemini to draw relevant information together so quickly for its AI search summaries.

While it’s not true that knowledge graphs do everything better than traditional relational or non-relational databases (they don’t), in the right circumstances they do have major advantages. If you want to start integrating AI vertically into your business, so that it can become the kind of operating system Karpathy describes and genuinely transform your workflows – as opposed to just being the kind of horizontal tweak you get by giving staff access to tools like ChatGPT, Claude and Copilot – then graphs are almost certain to become part of your digital toolkit. It is part of a process described, in another term coined by Karpathy, as “context engineering”.

While the more familiar process of “prompt engineering” concerns the subtle art of asking an AI the right questions, “in every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step. Science because doing this right involves task descriptions and explanations, few shot examples, RAG, related (possibly multimodal) data, tools, state and history, compacting... Too little or of the wrong form and the LLM doesn't have the right context for optimal performance. Too much or too irrelevant and the LLM costs might go up and performance might come down. Doing this well is highly non-trivial.”

This change, then, is a profound one. For an AI to offer cogent predictions and analyses, and suggest (or take) actions based on them, then the quality of the data in the graph it’s drawing upon is crucial. Keeping the data in that graph current, accurate, secure and up-to-date becomes one of the most crucial functions in the business, and this is of course the function of data governance (yes, we finally get to the bit where I get to tell you why what Securys does is useful and important!).

In this new world, effective data governance shifts from being a compliance add-on (and too often an unwelcome afterthought) to an active function whose mission is to ensure that data remains fresh, accurate, and secure at all stages in its lifecycle – and is deleted when it’s no longer required or is out of date. It blends seamlessly, therefore, with AI governance, as data audit, system architecture and model deployment and AI monitoring are now all part of the same whole.

This is neatly described by Gartner in its recently published paper on the AI trust, risk and security management (AI TRiSM) market, as four layers of technical capabilities that support enterprise policies for all AI use-cases. These form a classic pyramid, with data protection and AI governance forming key strata of the sandwich.

Gartner’s five key actions for leaders responsible for AI controls:

Inventory all AI systems within the organisation;
Review and update data classification, protection, and access controls for information used by AI;
Implement layered AI TRiSM solutions to consistently enforce policies across AI applications and use-cases;
Maintain flexibility by avoiding dependence on a single AI model provider as newer models and tools emerge;
Frequently assess vendors for consolidation trends and new TRiSM services that simplify integration and maintenance.

It is this development that really does transform the nature of the data in an organisation from being a risk into being an asset. We’re living through this transition right now, and if you’d like to find out more about how Securys can help you take advantage of it, feel free to drop us a line.

View full post