The Rise of Agentic AI: Transforming Business Operations

Hello, and welcome to the AI Engineering podcast,

your guide to the fast moving world of building scalable and maintainable

AI systems.

Your host is Tobias Macy, and today, I'm interviewing Ben Wilde about the impact of agentic AI on business operations and SaaS as we know it. So, Ben, can you start by introducing yourself?

Sure thing. So Ben Wild. I'm the head of, innovation here at Georgian based out of Toronto, originally from New Zealand, so apologies for the accent. And, I should say probably before we get started that,

as seeing as I do work for an investment firm that please don't take anything I say as an investment advice or really advice of any kind. So, yeah, important that you don't look for me for the next part.

And do you remember how you first got started working in the space of ML and AI? Yeah. I mean, my my professional journey into ML has really been through my work here at Georgian. But my very first experience with AI would was back at university

in the nineteen nineties, if I'm, to date myself and, you know, playing around with neural nets and expert systems,

actually in Prolog.

So very vintage AI and obviously nothing like we have today.

But from there, my professional journey was predominantly through data management.

I originally started my career in, you know, UNIX, network, sysadmin, etcetera. Ended up working, in the database industry,

for Informix and then, IBM software group, and then found my way into venture capital in the late two thousands, around 02/2008, '2 thousand '9.

And, you know, our original thesis here at Georgian was this idea that software companies were gonna start,

putting more analytical capability into their applications.

We called that applied analytics.

We'd seen a lot of tooling develop around this, but we hadn't really seen the application of it in software applications. So that was our initial

thesis, and that's evolved over the years. You know, through the twenty tens,

it became more oriented towards data science and machine learning. You know, we worked with a lot of our portfolio companies on building out data science teams and creating specialist models and working on things like differential privacy.

We've always been a very technical

team here at the company. And then,

obviously, now much more in the

generalist AI capabilities, the the, if you like, democratization

of AI that we've seen with language models and now large reasoning models. And so that sort of gets me to where I am today and, you know, spending a lot of time on

Agintic AI, working with

our portfolio companies,

and helping them predominantly around product strategy

related to that.

Innovation is one of those words that means lots of different things to lots of different people and sometimes even different things to the same person. And so in the context

of being the head of innovation

at an investment firm, particularly with a focus on AI and the systems and capabilities that that enables. I'm curious what that

translates to in terms of where you spend your time and focus.

Yeah. It's it's a great question. It's a pretty nebulous term. But for me, a big chunk of my role is actually

learning, especially at the moment with everything changing so much. You know, I've been in this, in the tech industry for about thirty years now,

and I've never seen, you know, the rate of change in terms of technology

that we're seeing at the moment. And so I think my rate of learning is probably higher now than it's ever been. Luckily, we have some pretty amazing tools now for doing that, which is, you know, some of these AI tools. So that's a big chunk of my time is

trying to

not even stay ahead, but stay current and then try and maybe think a little bit about where things are going. And then I spend my time, I apply that in working with our portfolio companies, so helping them with that as well. They're all in the thick of this,

and, doing it on their own, but they also can't be everywhere at once. So helping them with that. And then, involved in our pipeline work as well. So looking at, new investments and helping our team figure out where different categories of software might be hitting because of all the changes in in the industry. So that's that's pretty much what I do.

And then digging into

the context of AI and in particular,

agentic AI, that's another one of those terms that has started to mean a lot of different things where it might just mean, hey. I have an AI that can call a couple of functions, or it might mean I have something that's automating absolutely everything, and there's no human in the loop. And for the purpose of this conversation, I'm wondering if you can just give your definition of what you see as constituting agentic AI.

Yeah. Sure. I think the first thing I would say is my perspective,

in my view,

agentic AI is a pretty big tent. So I would view Agentic AI as a continuum that could include all those things.

So, you know, you talked about the simple function call, you know, treating a language model or a large reasoning model as a as a function. You know, there can be reasoning and some level of weak agency that happens even within that function call. Right?

The model figuring out

what you know, if you're asking it to extract entities, for example. So, you know, tell me all the people and the organizations in this piece of text. There's some level of weak agency around figuring that out, and these models will reason about that text and give you an answer

right through to, you know, stronger agency where

you're giving the model a task and it's figuring out everything to do with the task.

It's, you know, interpreting it, obviously. It's thinking about the subtasks. It's thinking about the tools it needs to use, and then potentially

acting out using other tools to make changes in the digital or the physical world. And so that's a that's a pretty big continuum, but I would include

potentially

all of that in the concept of agents.

And then I think just, you know, the key attributes are some of the things I talked about. So, you know, that, you know, an agent should have the ability to

interpret a task in all the available context and data. So it's the first thing. It's gotta know, you know, what's being asked of it to understand it. Although the word understanding there is means different things to different people, but, you know, the appearance of understanding the task, it needs to be able to plan and reason about that. So, again, those are, in some areas, technical terms like planning and and what some would argue what these models are doing is not exactly planning, but let's just use the, colloquial

use of the word there. So plan and reason about those tasks and subtasks, and then carry out the plan, as I said, somewhat autonomously. And this is where you get into this idea of a, I think, a spectrum of autonomy from that sort of idea of weak agencies through strong agency,

and I'm not really personally hard and fast on that. I think it evolves over time.

Yeah. I think that to your point of planning and reasoning being slightly overloaded terms and subject to misinterpretation,

I also like to think of it in terms of compiler instructions where

you feed a

computer language program into a compiler.

It parses it. It figures out what are the actual sequences of steps for translating that into the machine instructions. I think that that's a fairly

analogous space to what a lot of these language models are doing, where it's parsing the human language, it's figuring out, okay, what is the actual sequence of instructions for me to execute,

and then what are the actual steps to take based on those instructions?

Yeah. I I think it's a fair analogy, although I I would say that there's obviously a lot more determinism when we're talking about compilers, right,

and a lot more predictability,

than than we get with these models. And, you know, that introduces

its own complexities,

for organizations that are using these technologies.

And we can, you know, we can talk more about that, but that's, you know, one of the I think one of the one of the key challenges is around that,

if you like, the reliability,

that you get from these systems, and that's from an engineering perspective what we have to deal with.

Absolutely.

And then in the context of business systems

and software as a service, there have been a lot of

epochs of investment in automation

at different levels and for different purposes.

I think one of the more recent

versions of that that has seen a lot of investment, particularly

in enterprise use cases or maybe industrial use cases, is the idea of robotic process automation

where you have,

a means of either training or instructing

a software system what steps to take across multiple different

operating

environments

where maybe that's, a web environment

or interacting with different

APIs or control

systems. Obviously, Agentic

AI is aiming at some overlap with that, but I'm wondering if you can just talk through

some of the substantive differences that are introduced by the capabilities of these agents and these language and reasoning models

in juxtaposition with things like RPA and other maybe more deterministic automation systems?

Yeah. It's a it's a great question. I, one of my colleagues here, our head of engineering, Nahim at Georgian,

likes to talk about the, lasagna

of software.

And I think that's true with technologies like RPA as well. They don't necessarily

go away as new technologies,

appear, but they disappear into the stack disappear into the lasagna somewhat.

And I think we're seeing that,

that there's still a place for

using these deterministic technologies.

And in in many ways, RPA

is just one manifestation

of writing out software ahead of time to do a specific thing. You've also got, companies that do a lot of web scraping. Right? And they build components that scrape a particular page, and you could absolutely do that with a language model and some do. But if you've got a very high volume

of activity that you need to do, you know, you're scraping the web, for example, then it can become cost prohibitive to to use, at least today, language models do that. There's a lot more computation

required.

But what you are getting is more flexibility and and resilience in some ways. You can deal with, you know, the edge cases or the changing page. But if you don't need to, if, you know, if you're going to the Department of Treasury and you're pulling down the OFAC list,

that doesn't change very often. In fact, that's probably one of the criticisms of the federal government at times is that it doesn't change fast enough. But, like, that page, if it's not changing all the time, it's probably much

better done with an RPA like process, which just goes and does it. It's, you know, without the chance of randomness and and without,

the compute overhead. Because, of course, one of the challenges with using

a nondeterministic

model like a language model to do that is that you do it a hundred or a thousand times, it's gonna get it wrong in some percentage of the time, that task. And so well, you don't need to. You don't necessarily

want to use that. So it's my worldview is probably that you will end up seeing agents using,

technologies like RPA,

as tools to do certain things. And then other times, you'll

use technologies like the computer use,

agent that's come out,

especially if there's a lot more edge cases or if it's something that changes all the time. So it's kind of a horses for courses thing and, you know, could be true that RPA has been overused and will be used less or in more niche ways, but I don't think it disappears entirely.

And the other aspect of bringing

agentic capabilities into the mix is that

because of the fact that there is a lot of nondeterminism

and potential for mistakes to be made, particularly if you're dealing with models that are

prone to hallucinations or you don't have enough contextual information to feed it to prevent them or enough monitoring to identify and guard against them, is that it introduces a a certain amount of risk

to the system, to the business if there are mistakes that are made where you have that gradation that you mentioned of weak to strong models and the the amount of involvement of a human in the loop. And I'm wondering how you see that

factored into the overall calculus of businesses that are determining whether and where and how to apply these agentic capabilities.

Yeah. I I think it's a really good question, and it's going to be an ongoing

thing for a while. So

my worldview on this is that

we're quite a way away from having systems that are reliable enough to not have humans in the loop for

much of how we use these, technologies.

And so and specifically, I'm talking about, you know, language models and reasoning models. There's been great improvement in the reliability of these things,

and and we've seen improvement

in

you know, there's been good studies showing improvement

in error rates due to test time compute. So using reasoning and and more computation

at inference time,

that improves things.

But then those

studies have also shown, you know, there's a kind of a diminishing return there and and and sort of a limit to how good it could be gotten to. And fundamentally, that seems to point to, you know, hallucinations

are a feature, not a bug,

of these models.

And so it might and, you know, it they give us the create you know, they seem to be linked to creativity and things in the model. And so we probably have to

live with it, at least for this current generation of,

you know,

the transformer based models as they are today. Now they can evolve and there might be ways, you know, seeing work trying to put more structure into these models to improve

the reasoning, reliability, and things. But that's all research phase at the moment, and it might take one of those research breakthroughs to get us to the point where we can have, you know, fully autonomous

systems that have sufficient reliability to be used in a very wide

variety of enterprise use cases. I tend to focus more on the enterprise side than the consumer side. It's where the bar is probably a bit higher, but I think it probably applies to both. And so I I actually think there's a lot of, opportunity around thinking about how we

interface,

humans with these agents and,

you know, help humans quickly

evaluate and interpret it what interpret what they've been given by the agent. And I think that makes this all work really well. Let me give you an example. So in my own work, I'm a innovation team of one, but I have lots of Agenta helpers. Right? So I'm pretty

a pretty,

big user of You.com, which is one of our portfolio companies. They have a research agent that just put out an early access called

Ari. I'm a big user of, OpenAI's deep research as well. And the way I use these things is I'm when I'm writing, I often use them to find citations for claims that I wanna make in, in my writing. I we're an investment firm. We have to be able to back up what we say.

And so,

I use these tools to find,

citations that can can help me. It's very quick for me to be able to figure out if it's right or not. Right? I can go read it. I can go read the piece quickly, scan it, and I can see, yes, it actually is. It's relevant. That's one example, one extreme. On the other, if you're getting it to write a piece of work for you that you don't understand,

then you haven't previously understood,

then you're gonna spend a lot of time trying to figure out and reading through everything in detail to figure out if it's actually giving you garbage or not. And so the effort to go and figure out is the reasoning correct

could be higher than just doing it yourself. And so to the extent that we can build tools that makes that easier, that presents that information in ways that humans can more quickly understand if the agent is doing the right thing or not, I think that helps disseminate this

capability

beyond where it's getting a lot of traction at the moment, which is around, you know, things like web research and obviously coding as well, where it's actually also quite easy to figure out if the agent is right or not because the code either works or it doesn't to some extent. So I, you know, just to sort of go back and say, I think humans will be in the loop because of the reliability challenges. I think those are inherent to the technology. And I think we have to figure out ways of

matching up agents and humans

in ways that are more efficient because the more work that the agent does for us, the more effort is gonna be in figuring out if they've done the right thing or not. So that's that seems to me to be an area of opportunity.

So I think that what you're saying about the

determinism

of being able to verify the output of agents helps to build confidence in knowing whether and how to employ one of these agents where if you're working in a space that has

a lot more uncertainty in terms of the validation step of what the agents are producing, it will

increase the amount of confidence building and the amount of manual effort that's required to even set up one of these agents and incorporate that into a workflow. So the automation benefits are maybe not as high. You don't get as much return on investment. And I'm wondering what you see as the

stepping stones on the path to organizations

having the technical

confidence and competence to

select what are the workflows,

what is the technology,

and how much there is that's actually available off the shelf versus having to do custom building for actually incorporating agents into the business systems, particularly when you're dealing with customer facing situations?

Yeah. It's, it's a great question. Look. Something that we've learned from engagements over the past few years with our portfolio

language models in general of AI, and in many ways, agentic AI is the next step on their pathway, is that a narrower scope

tends to give better results.

So

not trying to be too broad, trying to avoid magical thinking and sort of overload the thing you're trying to do with too many things doing that doesn't go so well. So

the LLMs and LRMs tend to be better when they're used for specific tasks and the broader it is, the more opportunity there is a failure. And so, I think we're already seeing some of this with one of the design patterns coming out where you have specialist agents working with each other that have a narrow set of capabilities and can cooperate as opposed to having

one agent that can do it all. Now there's

bound to be pros and cons for different use cases, but there is a lot of activity around that specialization type of a problem. So I think it makes quite a lot of sense to teams to begin by adding an AI in small, controlled ways. You might deal with existing products or processes,

you might have some of that weak agency within that, you might use

more like an agentic workflow where the steps are predefined

but the individual nodes

along the way

is some amount of it. So it's probably starting more at the weak agency level than the strong agency level and then really thinking, you know, as I was saying earlier about how do you then interface the human with it so they can not necessarily understand how

the output is

arrived at because, you know, we still have a tentative understanding of how these models work, but at least be able to go through the reasoning and and look at that and see if it makes sense. That helps build trust and helps build comfort. I'll give you just one quick example. I was at a investor conference last year in Chicago, Bridgewater were presenting

the macro fund, and they presented

a research agent they built that, you know, you could ask a question about, say, tariffs, and it would tell you about the potential effects

based on economic theory on, say, currency. And their approach was to

take the reasoning that was outputted from the model, was to pass through it, basically build a graph of all the concepts that were in the reasoning, present that to the human so they could see the linkages that the model had, the rationale. And then they'd also augment that with whatever economic data they could provide, so then they could show how much of the graph was validated by data,

But then the human could see the whole thing and instead of having to read through the whole reasoning, you know, potentially pages and pages of it, they could just look at the the graph and and see it for themselves. Now there's obvious challenges around this because you're using a language model to create the graph. So you're, you know, so there's potential for hallucination there as well. But it was, I thought, an interesting and pretty novel way of thinking about how to better present the output from the algorithm, and it would, you know, help build trust.

And then

the other aspect of bringing agents into the operating fabric of a of an organization

is the actual

alignment

across all of the different

business units beyond just the technical capacity to deploy the agent because everybody needs to understand

what is the impact on the business, what is the actual strategy, what are the goals that you're trying to achieve through employment of that technology.

And I'm wondering how you're seeing organizations

address that problem of organizational

alignment and educating everyone on the

realities

of the technology

and its capabilities

as well as its shortcomings.

Yeah. That's a great question. There's a couple of parts to this. I mean, one is obviously that support top down leadership. I think everyone saw Toby Luque's

memo from

Shopify in the last couple of weeks talking about, you know, his

expectations on how AI should be used at Shopify. I think that sort of leadership can help. But Toby is also a very technical founder, and I think one of the challenges

right now about agentic adoption, at least for engineering organizations,

is the sort of expectation

gap between

executives

and

the sort of the engineering realities of what can be done and how do you manage that. And what I mean by that is I don't think

teams are having to convince

leadership that this is a good idea. I think there's a lot of discussion at the board level on down. It's like, okay, what are we doing to recognize? How are we driving more engineering efficiency? What are we you know, how are we using agents, etcetera, etcetera?

And I think one of the challenges that technical folks have to work on is helping executives,

in a lot of cases,

see that there's, you know, just because,

you know, deep research, for example, OpenAI, very accessible, you can use that for some, you know, pretty wide range of use cases, and it can produce, you know, what probably feels like magical results at times. But there's a big gap between

that sort of use case and putting it into, you know, like a financial crimes and compliance system.

And so managing those expectations

and making sure the organization is thinking about, to your earlier point, you know, the stepping stones to build towards that and not trying to leap towards

the end state, I think is really important because

there's a lot of open engineering questions around

these technologies, there's a lot of investment that's required. We might be using,

agents

to get efficiency in terms of writing code, but also using agents,

and in particular building agents for other people is likely, my view, to significantly increase your investment required around QA validation, right? So, and it's much more complicated than previous waves of software development

because of all the things we've been talking about around nondeterminism

where you can, you know, run the agent a hundred times and

maybe 90 times it works and 10 times it doesn't. And so you you need to, you need to invest to be able to test for that and manage that. So those are some of the issues, I think. And then, you know, generally,

education and

overcoming

people's

hesitancy towards this technology. I think forget the source, you can put it in the show notes, but the survey, I think it was last year, four out of five Americans

concerned about, AI taking their jobs. And so, you know, that's a reality for applying this technology

in organizations is that, you know, people are concerned about what does this mean, and they might not see it the way that a technologist like you and I would see it, which is, you know, this is an amazing force multiplier,

this gives me superpowers, you know, for other people it's just more of a concern, hey,

am I going to lose my job? So I think education and socialization is important and also just the expectation setting, whether it's expectations of executives about what can be done from an engineering perspective or it's expectations about workers about what, you know, what this means for their role, in the organization. I think those are some of the things that get get overlooked. Probably as technologists

helping shepherd this this new wave of technology through, we, you know, we need to pay some attention to.

And then another element

of challenge

in terms of being able to employ

agentic capabilities

is that

up until now, there has been a lot of technical investment required to actually pull together all of the pieces that you need to even make it happen, where one of the

biggest challenges is ensuring that you have a high enough level of data maturity to be able to feed the information to the agent that it needs to be able to populate context and make informed decisions

about the tasks that it's being given.

And over the past,

what seems like, two weeks, maybe it's been close to a month now, there has been a lot of activity and noise around the model context protocol introduced by Anthropic. Google has made an announcement as far as its

protocol for different agents to be able

to interact with each other. And so there has been some investment at that protocol and system level to ease the cost of integration and the challenge of integration with the all of the different systems that you might want the agent to interact with. And I'm wondering what you see as the

near to medium term impact on the

capabilities of agents and the ease of implementation

as more of these ecosystem investments come to the fore?

Yeah. It's, it's it's really important. I again, I'll share this in the show notes. We

recently did our second

AI adoption survey, and this is a survey that that Gorgen does in

partnership with Nutanix. It's a research company. We did one last year. We've just done one in the first half of this year focused on the Gentec AI. We surveyed the second one, six hundred R and D and

go to market leaders, so half and half. 400 of those were in the tech industry that you're building and selling software, basically, and a couple of hundred were enterprise.

And for the R and D leaders, so the 300

R and D leaders that responded, one of the top three issues that was holding them back from scaling Agenstic AI across the organization was a lack of standardization

around some of these frameworks. So frameworks and standards to help. Basically, a bit of a cry for help saying, Look, there's just so much going on, and

what do we build to, and we

need to commit code and do things, but it's changing. And so, I think in that context,

things like the model context protocol, which came out in October, you mentioned from Tropic, but it's really in the last few weeks. It's got some momentum,

especially with OpenAI

coming on board in the last month. I think earlier in the year, Microsoft said they were going support it as well. So it's got some momentum, and now with the agent to agent protocol from Google, that would be interesting to see where that goes. It could really help as well. So I think those things are really useful and important for building out the ecosystem, but they don't also solve all your problems. Right? Because MCP is in many ways,

you know, it's a a layer that you're gonna stick on top of your REST API and it allows you to be a bit more selective about what you expose to these agents. So you can take specific subparts of your REST API and then expose it and make it available. And it's super helpful, but it describes

part of the problem. It helps you describe part of the problem, but it doesn't solve all the problems. Right? So it doesn't address like REST, didn't address

security and authentication. So you still gotta have all that. And in fact, delegated

authority and delegated authentication and security for agents is potentially,

in my view, a lot more complicated than for humans. Right? So, you know, you and I can

use applications and credentials and things, and that's fairly well understood that when you give it to an agent to go and act on your behalf to access some system of record, so some SaaS application that is accessible via MCP, perhaps, and you know, that's a system of record like a CRM, you know, you're gonna have to give or credentials will have to be provided to that agent, which may map back to a human, etcetera. So it's I think things like MCP and HOA are are very helpful, but but there's a lot more work to be done. So it's a step in the right direction. It's also interesting to think about the different sorts of, I think, design patterns that they support and there's,

I talked earlier about this idea of using

simpler specialist agents that, you know, maybe know how to do one or two things, and maybe you have a collection of these sitting on the CRM and you might have a collection of these sitting on top of a accounting system and each of them has particular role, and, you know, a to a could help potentially have all these things communicate with each other, and they may be using the model context protocol within themselves to talk to these systems.

So that's one pattern. But within that pattern, if you're talking to those agents, then you're kind of leaving it up to those agents to get it right. So

we talked earlier about the fact that reasoning reliability is an issue and it will probably vary by model. And so if you're, you know, using the agent from your CRM vendor and that's how you access what you want in your CRM, you don't have a lot of control over

how well that's reasoning or how well it's doing the thing you think it's doing. So what you might want to do instead is, you know, another pattern would be to have more

a broader agent that has the ability via, again, MCP to access both the accounting system and the CRM system. But that comes with its own complexities. Right? Because it's gonna be more code to maintain and everything, but you have more control. So

I think these

standards and these approaches are super helpful, but there's still quite a bit of work to figure out what the right design pattern is for a given situation.

And then there's also, you know, these are new standards, so they're not going to be complete and that they don't address all aspects of building these agents, so like security, which I mentioned. So there's still quite a lot of stuff to be figured out. But to the extent that they help sort of dampen down the complexity because

there's some standardization about how these systems can talk to each other, I think they're really useful potentially.

And as you're describing

some of

the aspects of MCP and how it's just a different means of accessing restful interfaces or APIs, it also brought to mind a lot of the hype that came out with the initial introduction of GraphQL

where front end engineers said, oh, this is gonna change everything. I can just do everything with GraphQL, and my life will be so much better. And then it got thrown onto the hands of the people who are responsible for the back ends, and they said, this is awful. Now I have to do 10 times more work to be able to enable this API interface. And it has its use cases, and there are people who are using it, but it wasn't the earth shattering replacement of restful interfaces

that was proposed upon its initial inception. And I think that there's probably a a similar parallel to the model context protocol where it will have its use cases, it will have its niche, but it's not going to be a blanket replacement for the interfaces that currently exist.

Yeah. I think you're probably right. And, of course, GraphQL was going to replace everything on data management as well. Right? But SQL seems to kinda be cool again for some reason. I've been a SQL person for the last, you know, twenty five plus years and, you know, there's something about the simplicity of it that is appealing. But then again, you know, I think there's pockets of where GraphQL is being used pretty extensively. Again, it's kind of the it's back to the lasagna theory of software, right? These things that come in. There's a there's a lot of excitement. It's the top layer, and then something else comes along. And it it fades a little bit into the background, but it doesn't go away, and it adds to the the flavor of your software stack.

Absolutely.

I I like that, that metaphor. I'm definitely gonna have to adopt that. And then the other

aspect

of agentic capabilities,

its role in business, but more specifically, its role in products is the impact that it can have on the ways that we interact with these software systems and the ways that

these products and capabilities are delivered where

since in the nineties, the way that you got software was you bought a set of floppies or a CD ROM, and then you brought it home and you put it in your computer.

And maybe you would get an update if you happen to have an Internet access.

And then as

the Internet and the web

grew in its availability and accessibility and speed,

we got rid of CD ROMs and shipping physical media because it was too expensive, too costly, too much, of a logistical overhead.

And, also,

selling subscription based services was much more lucrative for the businesses that were producing the software, and so we moved into the era of SaaS where

you just paid for access to software that was delivered over the Internet, whether it's through a web browser or a thick client or a thin client on your desktop. How do you see

this new era

of agent based and language model based interfaces

either augmenting or replacing or disrupting that marketplace of SaaS as a delivery mechanism?

Yeah. I mean, it's a great question. It's it's a a debate that was kicked off back in December in earnest by Mandela from Microsoft saying that, you know,

SaaS is dead. And I certainly don't subscribe to that perspective. What I do think is happening is a couple of things. One is, as we were saying earlier, I think humans are going to be involved. Humans will be in the loop for the foreseeable future, potentially a technology breakthrough away from reliable autonomy using these technologies. And so we're going to have to have user interfaces, and the user interfaces are going to have new and more sophisticated

and different ways from how they are today, orientated

more and more probably around

interpreting Agintek work. And, you know, we all become

supervisors of agents, right, which is a whole other discussion about how how much fun that job's going to be. But that could be where things get where we are doing less of the grunt work and more of the interpretation and more of the supervision.

So, there's a change to what the software interface needs to do potentially, in my view, but there's still probably the requirement

to log in constantly.

Quite likely our Slacks are going to be full of messages from worker agents asking us to review things, and we'll go into a system and we'll look at it. So I don't think I think the nature of what we do in

these SaaS applications might change, but I don't think that regular daily interaction

necessarily changes. But I do think that, and we're seeing this already, is that the pricing model needs to change and it will be more usage based, which is of course how cloud is priced, it's

how IBM has priced its software and systems since the 1960s.

Like usage based pricing is pretty common, it just hasn't been common for SaaS application.

So I do think there will be an increasing amount of that happening, and it's really just necessary

from a gross margin perspective because

you've got to, you know, one of the things about

a genteq software is

the costs of answering the questions being given can be somewhat open ended.

And, you know, I think you've probably seen this if you've used any of these deep research products, is, you know,

sometimes it takes five minutes, but the other day I had a forty minute prompt. So that prompt went off for forty minutes

and it ran on I don't know how many servers at OpenAI

to answer my question. And so, that's a pretty wide range of potential costs. And from a venture capital perspective, you know, when you're looking at companies and and thinking about how to process these things, it's obviously pretty imperative that you can align

the pricing model somewhat with the cost model. So I think we will see changes

around that.

Which also brings up a lot of challenges

in terms of the impact

on the

appeal of these systems where, to your point, cloud has

a lot of pass through costs where you're paying for usage, but your usage of those systems is typically going to be more predictable and follow your typical patterns of business,

where if you have a certain you know, using ecommerce as an example, if you have a certain season that's busier, you're going to be paying more for server costs and infrastructure, but you're also going to be bringing in more money as a result.

Whereas the nature of these generative models is

much more

unpredictable

as far as the overall cost of interaction,

and you can't as easily do a per user cost metric so to be able to then deliver it at a flat fee to your end users and

average out across all of your customers

what your profit margin is. And so as you said, all of these language models as a service, OpenAI, Anthropic, Google, they're saying, okay. We're going to charge you a usage based, but then it becomes challenging for the consumer to decide, oh, can I actually even afford this? Because I don't even know what the overall cost is going to be. And so if that then propagates through to more of the SaaS subscription services where it's, okay, you can

pay for our service, but now if you're going to use AI, it's going to be some variable price that you have no

understanding

of what that's actually going to be, how that might disrupt the

customer model that we've been developing in these SaaS more predictable,

load cases,

and how that might change the

types of people that are being attracted to these services.

Yeah. I I think

that that's a super important point, and I think CFOs to financial officers really

get nervous around this stuff. You know, just from a user organization, say like an enterprise that's

trying

to have to budget or

the use of a tool, for example, a nagentic tool. I remember back in the in the nineties, I was working for a software company at the time called Informix,

which used to compete with Oracle.

We lost that

and ended up being acquired by IBM.

But at the time, Oracle introduced,

around that time, they introduced some

more nuanced pricing. So back then, we used to most database software companies charged based on CPU. So if you had four CPUs or 16 CPUs in the server, you paid more. And, Oracle introduced more nuance to that, which was more around, like, the amount of compute capacity that the h CPU had. It just basically became a more complicated formula,

but real not that complicated, but it was really difficult for people to calculate, to figure out. It wasn't simple and easy. It was hard to budget for or harder to budget for, harder to understand,

and that wasn't even really usage based. So when you think about, you know, I think people will innovate around pricing, you know, my view is that there will be innovation around pricing for these agentic systems,

you know, there's probably opportunities

to

ask users if they want to offload it and run it during

quieter periods of the business day,

there's

opportunities to maybe bid out work and have it just run slower on some a100 somewhere, and it doesn't matter if it takes two days to get back to you, I don't know. I mean, for some use cases that will work. So build this innovation

to try and

smooth out the cost, reduce reduce the costs, but it's likely, I think, in my view, to be still quite complicated and hard to understand.

And any attempt to simplify it is going to make, you know, one party or the other is probably going to lose on that. So, for example, with the deep research stuff from OpenAI, they give you you pay the $200 a month, they give you 120 queries. But each of those 120

queries can range, as I was saying earlier, from like, it could be like two minutes or five minutes up to forty minutes. My weekend experiment over Easter is going to be seeing if I can put an open ended prompter that actually gets over an hour. I think it's going to be my new challenge, to see how long I can run a query for. But that's a real planning problem for OpenAI, right? So but, you know, they'll figure it out over time, but there is complexity around this. And so I do think we will see more usage model pricing,

but there's a lot of work to be done to figure figure out ways to actually price and pass on those usage costs. I I don't think it's particularly straightforward.

I think too that it will also

change the calculus of

what products these agentic capabilities get incorporated into because for more generic consumer facing systems,

you don't want to have that level of unpredictability

in your operating costs. And so maybe they get incorporated, but it's limited to you get five uses a month of this particular feature or whatever that cap ends up being so that you know that you have a certain threshold beyond which they're not going to be incurring expenses on your operational side of the things.

Absolutely. And then, I'll just point out another sort of complexity here as well with the the current, you know, limiting the number of queries is you can actually

it incentivizes the user

to pack more into each query. So it's not based on minutes.

So it's actually I'm incentivized

to not

look for one or two citations at a time, I should be giving it an entire document of citations, and that's the same cost to me, but it could take hours for the the reasoning model to find all those things. So I do think that this sort of brings us on to the topic, though, there's more that these agents would be able to do than

is going to be allowable by cost, by regulation, and other issues. Right?

So, you know, that's another part of what, you know, as engineers, as technologists, we have to figure out is, is the business context is important because just because you can do it with a language model, it doesn't mean it makes sense. A tactical example on

that is, you know, I think Microsoft just released a thing, you know, that they can now that was showing Doom, I think it was Microsoft,

was,

the level one

of the old first person shooter Doom that came out when in the nineties, and I used to play a lot. It's able to be, generously produced out of a a model. And they

it seems like a very expensive way to me to to play a game. I've also seen other experiments where people are generating

HTML user interfaces out of language models dynamically on the fly, but it doesn't make sense necessarily from a cost perspective. And if the interface doesn't change all the time, why would you do it? So I do think there are a lot of things, a lot of capabilities that these models have,

which are technically capable to do something, but don't necessarily make cost sense. And then on the compliance side, you know, I would expect

there will continue to be for quite some time a number of situations, you know, whether you're talking about finance,

medical, or aerospace,

where there's specific regulations that you could use an agent to do this thing, but from a liability and responsibility perspective it has to be, you know, a specially qualified human that doesn't. So I think these are some of the things that as, you know, technologists,

even more so than with maybe previous waves of technology, we have to have some sort of appreciation and consideration for.

Absolutely. And I think to the other trend that we've seen in cloud, to go back to that example, is that the cost per compute has gone down as the availability and operational scale has gone up, And we're already starting to see some of that in the language model and reasoning model space where

the compute requirements

go down while the capabilities go up because you're able to pack more functionality into a smaller number of parameters as well as the underlying inference

engines have become more efficient, able to run across a broader range of hardware. So you're not completely vendor locked into whatever NVIDIA happens to cook up next. You're able to run it on CPUs or TPUs or other other types of compute.

And I'm curious what you're seeing

as a general trend in that space of the areas of investment

at the software and hardware layer to be able to

allow for the proliferation

of these generative models as a more

core component of the underlying compute substrate, the technical ecosystem?

Yeah. I mean, a general observation would be that there's been an enormous amount of edge capital money going into

ASIC, other,

non GPU

or non general purpose GPU. I would expect that will come through

as more choice in the market over time. There's also

efforts

by Chris Lettner and the team at Modular and other people doing similar things as well, where they're trying to remove what they would probably describe as the stranglehold of CUDA, which is the of course the programming interface that NVIDIA has, which gives them a huge amount of advantage in the market. It's been around for time, it's very powerful,

you can, you know, write lots of

optimizations with it. It was in use, you know, for special effects clusters

at places, you know, like Weta Digital and others, well before it was being used for machine learning, and it's given them a huge amount of advantage in terms of getting the best performance out of models. So there's quite a bit of interest and activity in trying to make it easier to move models between different architectures,

even just between different GPU architectures, but to your point as well, moving some workloads onto CPUs as well. And so, and that goes back to what I was saying earlier, you know, part of the the pricing and the cost issue could be, you know, as it gets easier to move an agent to a, it might be slower platform, people will probably do that for certain types of use cases and workloads, and

it will help. But there is a lot of activity, but it's still nascent. It's still just, I think, it's not at the point yet where I think it's making a material difference to people's choices. But I also think the other, to your point about reducing cost overall, we're certainly here to be seeing that. I mean, at least my reading of things is that cost to serve is going down pretty rapidly. You see that in a lot of the graphs, the charts and graphs. But at the same time, we're just getting started with building these technologies into software applications. Going back to the survey I was talking about earlier, 91%

of R and D respondents

had plans to adopt Genetec AI, and 45%

of them had begun some sort of implementation. But it's a lower number that's actually got things in production. And then if you look at things that are in production and then how far they've rolled out, there's a lot of work to be done. So I think we're only just probably touching the surface in terms of the demands. And so it remains to be seen, you know, whether what happens on the cost

side.

Another factor that comes into play when we're dealing with anything that is this far up the hype cycle and this noisy on the hype cycle is that every organization

that isn't adopting agentic AI has that fear of missing out of, oh, if I don't deploy an agent right now, then I'm going to miss out, and everybody else is going to

leapfrog me. But then there's also a lot of value in not being a first mover in the space because you let everybody else work out all of the kinks and do the investment of making it easier to implement and operate. And I'm curious what you're seeing as an overall trend of organizations

battling that balance of fear of missing out and having to be a first mover versus waiting for other people to pave the way for you so that it's easier to adopt if and when you decide that it's actually useful and actually mature enough to implement?

Great question. I mean,

what I would say my view would be that you've got to be playing with these things to learn how they work, to understand

what they can and cannot do. So you might find lower risk, non customer facing, non product related opportunities. So there could be internal efficiency opportunities

to really get a sense of how these technologies work, and then doing experiments but not necessarily going all in and trying to build something like

an agent with very strong agency.

So going back to what we're saying earlier, looking for those

narrow opportunities where you can exert more control, so maybe agentic workflows with agency within individual steps, but not an overall

all singing or dancing agent, I think those are the, in my view, probably the right things to be doing. I don't think it makes sense for organizations

to sit it out.

There are some use cases which,

in my view, provably

useful. In particular, around coding is perhaps the most obvious. And then the research and the use of language models is a front end

to search engines. The front end to information

in general is, becoming,

you know, that's really, I think, very quickly become a bit of a killer use case. And you know, we've gone from worrying about false citation

in these models because they weren't, you know, they were coming up with things that didn't exist, to now most of these models

are now able to give you reference, you know, point to the source information so you can validate it for yourself. And I think that's been a bit of a game changer in terms of the usefulness. So there's a there's a lot of, I think, opportunity in any organization to start to experiment. And I think sitting it out is is probably, for most people, a bad idea. But at the same time, I you know, blindly going all in on some of the more complicated, sophisticated

use cases is also probably not advisable, although, you know, I'm not here to give advice. It's just my opinion.

And in your experience of working in this space, working with companies who are implementing these agentic capabilities and incorporating them into their business processes or products,

what are some of the most interesting or innovative or unexpected ways that you've seen that capacity applied? Yeah. I mean,

ways that you've seen that capacity applied?

Yeah. I mean, I think probably one of the more interesting things I've seen recently is this isn't necessarily

that different from how technologies have been used before, but I found it really interesting, was interesting was one of our companies using language models.

Instead of taking the approach like an MCP type approach where you try and predetermine

all the different parts of your REST API that are going to be useful and then bringing those forward as well defined API calls by MCP. What they're doing is they're taking

the text to SQL

concept

and instead

it's text to REST call. So dynamically

being able to have their agent figure out what part of their API wants to use at a given time. And based on what I was hearing from our R and D team, internally,

we've even seen pretty stellar improvements in recent model releases at doing this. So it's been surprising to me how good the models are at, given

the description of the task, actually figuring out not quite constructing the tool on the fly, but figuring out what part of the API it needs to use. So I think that's a very interesting and potentially powerful use, but, of course, very similar to people that have been using these. And then the other is just from my own perspective, just as I was saying earlier, how useful these tools, whether it's from a like a you.com

or an AI or how

Google's in on the game as well now with their deep research product, how good these tools are getting at doing research for you on the web, which when you're a knowledge worker and a learner, like I was saying at the beginning of this, you know, it's a big part of my job is to learn. That's a major productivity

improvement,

surprisingly so.

And in your experience of working in this space and helping companies to work at that leading edge of these capabilities, what are some of the most interesting or unexpected or challenging lessons that you've learned personally?

I think the biggest challenge

that I see is

something that we've all experienced, which is, you know, these tools can be

one moment the most magical thing you've used and the next moment the stupidest thing. And the big challenge is, I mean, that's kind of okay when you're sitting in a chat environment,

you know, you're the human in the loop and you're using it, you can stare at it and things, but that's a surprisingly hard problem to overcome

in an engineering environment. And I think it's particularly hard because, and there is research that backs us up from the medical

field, which shows that we're less forgiving

of AI than we are of humans. And I think that that's a surprisingly difficult challenge. I think we're going to be a surprisingly difficult challenge to overcome. You know, we see this to some extent with self driving cars, right, where, if you take the statistics we provided at face value from various companies,

they appear to be measurably

better in a number of situations in terms of accident rate than humans, then something happens

and all hell breaks loose.

And rightly so, there's a lot of questions around that, around who's responsible and liability and all that sort of thing. But I think that challenge is kinda

be surprisingly difficult to solve for engineering teams that are trying to build products that just occasionally do random things and do the random wrong things.

And for businesses that are considering

the application of agent and capabilities

at some layer of their business, what are the cases where you would advise against it and they should just do the standard deterministic

build your own software approach or just buy an agent off the shelf and don't try to build it in house.

Yeah. I mean, the in the second case, I,

I tend to recommend to people either in the my personal life or in my business life is go and look at what else is out there. Go and use one of these tools to go and figure out what else is out there before you start building it because there is just so much activity and there's a lot of smart people out there building cool things. And so I think, in my opinion, you should definitely go and look at that. As to when Agintiq might be the wrong choice, I think, again, it comes back to a question of how much agency and automation you're considering for the situation.

It's not really specific to Agenetec, it's also applies to LLMs. But if you've got scenarios where there is no tolerance for error,

then that's probably going to be an area that this isn't going to work well. But also, you might find there's ways,

certain situations, certain industries,

certain use cases, where there's a mandated way of doing something, or it's just very predictable how it needs to be done. And so that is possibly or probably inefficient to use these models to do that. You can just use good old procedural code and get it done. There'd be other situations where it looks like something that an agent could do, but you're lacking the digital infrastructure. Go back to what you were saying earlier about APIs and, you know, in some cases, not having them available to agents. I mean, that could be a big problem in certain situations. And then there could also be situations where, I don't know what your experience has been like to last, but if the problem is very open ended or very novel,

you're trying to find a very novel solution. The models can be okay at that, but they're not

really close to what a human would do with, you know, like coming up with a nuanced,

not generic business strategy for a company. Like, you can get some cool ideas, but you really probably need a human involved. And so again, it goes back to the agency thing. So it's like, it's not necessarily true

that agentic approach is wrong. It might just be the level of human involvement that you have to tweak for a given use case.

And as you continue

to invest in this space, work with companies who are adopting and adapting these agentic systems to their use cases? What are some of the ongoing

developments and investments

in the ecosystem of Agenstic capabilities that you are particularly keeping an eye on?

Well, the infrastructure underneath all of us, so that's that's interesting. There's a lot of, a lot of building blocks that are required to bring this stuff out of research and engineering and into the real world, so that's that's interesting. The the standards that we talked about, I think, is a really important area.

I'm personally tracking, you know, we decided to see where things like MCP and eight twenty eight go because they help create ecosystems, and I think ecosystems are good for everyone. I'm also following closely what's happening in the software development lifecycle,

where I think the team here is the R and D team is somewhat evenly split between flying and Cursor, and it's fascinating to see where those tools are going, becoming more and more gintech. But also, you know, we're experimenting with tools like Factory and others that are doing more of the back end

SDLC work. And so I think that's

particularly interesting. And then, you know, we've talked about it a bit, my personal interest

is around reliability and reasoning and following where that's going, because that to me feels like one of the larger unlocks. There's a lot of things that need to be figured out.

As I was saying from our survey, people are talking about lack of standards. There are other things that come up in that survey around cost. Obviously, privacy keeps on coming up, but for both the R and D respondents and the better market respondents, you know, when they were asked what their number one issue was for, you know, holding them back from wider adoption, it it comes back to reliability. And, you know, while we don't define reliability in the survey, I'm very confident to say it's not like they're worried about the APIs going down from OpenAI or Anthropic. Reliability in this context

is generally well understood to be related to hallucinations and the one deterministic nature of these models. So I think advances around that. I'm fascinated by the neurosymbolic

approaches, you know, everything that's old and classical AI is new again. Maybe there's a world where my Prolog programming schools could be partnered up with a language model, but I I doubt it. But, you know, all these different techniques were trying to improve

the performance of language models by pairing it with other things, you know, whether it's graphs or formal planners and all these sorts of things, I find interesting as well.

Are there any other aspects

of the applications

of Agentech AI to business and product use cases that we didn't discuss yet that you'd like to cover before we close out the show?

I just think maybe just talk briefly about a couple of the gaps as well. Like, it's I mean, I talked about the technologies that we're tracking. I do think that there are some important gaps that I guess they would fall into that category as well, which I didn't mention. But these are some of the gaps that I think we need to see from an engineering perspective to help with development and adoption: things around

more investment and reliability and evaluation tools.

So there's obviously eval frameworks and things. I think there's a lot more

than is probably to come around that just to help with testing and validation and making that easier and easier. I think we touched on this briefly there's this whole

challenge and opportunity area around security and access control, and I've seen some discussion recently online about, you know, what startups are working on trying to think about identity management and authentication and things for agents, and there are companies working on that, which seems interesting. And then I think just more and more around probably observability and monitoring as well. So I think, you know, just from an engineering perspective, it's super exciting that, you know, this stuff is happening, but the, you know, the continued,

you know, it's going to take some time to build these things out. And so in some ways, we sort of have to temper our expectations and temper the expectations of of leadership as to what could be achieved because, you know, we're kind of assembling this car while we're trying to drive it. Right? There's a lot of bits that still need to be built out. Absolutely. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing and the, insights that you're providing. I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gaps in the tooling technology or human training that's available for AI systems today.

Yeah. I think it's it's I think really, very much is what I said. It's around

if you were talking from a, an engineering and, like, developer bringing this technology to market, I do think there is some real challenges around

security and authentication

as one. And I think also, as I was talking about earlier,

the human factors piece in UX,

and there's

a need to

rethink how we design user interfaces to work with these technologies. And I don't think it's not just an incremental, trivial fix. I think it requires probably going back to best principles about how we think about communicating information to humans efficiently and then

to make these systems work.

Alright. Well, thank you very much for taking the time today to join me and share your

insights and thoughts on this overall space of agentic AI and its application

to business use cases and product capabilities. It's a very fascinating space and, obviously, one that's very fast moving. So I appreciate you taking the time to share your thoughts on that, and I hope you enjoy the rest of your day. My pleasure. Thank you for having me.

Thank you for listening. And don't forget to check out our other shows, the Data Engineering Podcast, which covers the latest in modern data management,

and podcast.init,

which covers the Python language, its community, and the innovative ways it is being used. You can visit the site at the machinelearningpodcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hoststhemachinelearningpodcast

dot com with your story. To help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.

AI Engineering Podcast