In this episode product and engineering leader Preeti Shukla explores how and when to add agentic capabilities to SaaS platforms. She digs into the operational realities that AI agents must meet inside multi-tenant software: latency, cost control, data privacy, tenant isolation, RBAC, and auditability. Preeti outlines practical frameworks for selecting models and providers, when to self-host, and how to route capabilities across frontier and cheaper models. She discusses graduated autonomy, starting with internal adoption and low-risk use cases before moving to customer-facing features, and why many successful deployments keep a human-in-the-loop. She also covers evaluation and observability as core engineering disciplines - layered evals, golden datasets, LLM-as-a-judge, path/behavior monitoring, and runtime vs. offline checks - to achieve reliability in nondeterministic systems.
Announcements
- Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systems
- When ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.
- Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.
- Your host is Tobias Macey and today I'm interviewing Preeti Shukla about the process for identifying whether and how to add agentic capabilities to your SaaS
Interview
- Introduction
- How did you get involved in machine learning?
- Can you start by describing how a SaaS context changes the requirements around the business and technical considerations of an AI agent?
- Software-as-a-service is a very broad category that includes everything from simple website builders to complex data platforms. How does the scale and complexity of the service change the equation for ROI potential of agentic elements?
- How does it change the implementation and validation complexity?
- One of the biggest challenges with introducing generative AI and LLMs in a business use case is the unpredictable cost associated with it. What are some of the strategies that you have found effective in estimating, monitoring, and controlling costs to avoid being upside-down on the ROI equation?
- Another challenge of operationalizing an agentic workload is the risk of confident mistakes. What are the tactics that you recommend for building confidence in agent capabilities while mitigating potential harms?
- A corollary to the unpredictability of agent architectures is that they have a large number of variables. What are the evaluation strategies or toolchains that you find most useful to maintain confidence as the system evolves?
- SaaS platforms benefit from unit economics at scale and often rely on multi-tenant architectures. What are the security controls and identity/attribution mechanisms that are critical for allowing agents to operate across tenant boundaries?
- What are the most interesting, innovative, or unexpected ways that you have seen SaaS products adopt agentic patterns?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on bringing agentic workflows to SaaS products?
- When is an agent the wrong choice?
- What are your predictions for the role of agents in the future of SaaS products?
Contact Info
Parting Question
- From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?
Links
- SaaS == Software as a Service
- Multi-Tenancy
- Few-shot Learning
- LLM as a Judge
- RAG == Retrieval Augmented Generation
- MCP == Model Context Protocol
- Loveable
The intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
Hello, and welcome to the AI Engineering Podcast, your guide to the fast moving world of building scalable and maintainable AI systems. When ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models. They needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App relies on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated.
Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows, but Prefect didn't stop there. They just launched FastMCP, production ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing fast Python execution. Deploy your AI tools once. Connect to Claude, Cursor, or any MCP client. No more building off flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and FastMCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.
[00:01:28] Tobias Macey:
Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most, building intelligent systems. Write Python code for your business logic and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML and AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch.
Build end to end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin. And for dbt cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud. Your host is Tobias Maci, and today I'm interviewing Preeti Shukla about the process for identifying weather and how to add agentic capabilities to your SaaS platform. So, Preeti, can you start by introducing yourself?
[00:02:46] Preeti Shukla:
Absolutely, Tobias. Thank you so much for having me today. It's my pleasure. A little bit about myself. I'm a product and engineering leader with, twenty years building and scaling b to b and enterprise grade products, mostly SaaS products. I have led products and teams across three major transitions of industry. I started my career in on premise enterprise SaaS, moved on to SaaS products, you know, and then off late more recently, I've been working on Geni and agentic products, navigating teams and products into shifting into navigating this very transitional time of our industry. And that's the lens I bring to this conversation today.
[00:03:11] Tobias Macey:
And in that context, what was your introduction into this overall space of AI and just some of the background that you bring to your perspective perspective on on it? It? Absolutely.
[00:03:23] Preeti Shukla:
So I got little exposure to classical machine learning. You know, when I was an engineering leader at Workday, we were building financial applications and working on credit scoring for the customers. But this was mostly pre AI era 2015. I would say that I really immersed myself into AI when I did my own startup, which was an AI powered, a two sided platform, and that's where we actually built a recommendation engine. So I think that was my first AI native experience when it comes to building the products. And after that, I have had opportunities to lead various traditional SaaS enterprise and b to b companies into mostly Geni and agentic products.
[00:04:05] Tobias Macey:
And so in this context of SaaS platforms, obviously, really iconified maybe the past fifteen years worth of Internet focused business. And And I'm wondering if you can talk to some of the ways that the introduction of agentic capabilities and AI powered features just changes the requirements around the technical considerations.
[00:04:32] Preeti Shukla:
Yeah. Absolutely. I mean, to understand about, SaaS businesses, which most of us do is they operate under certain constraints and expectations. SaaS fundamentally is a delivery model. And you basically deliver the same software in a repetitive fashion at scale to multiple customers. And certain SaaS artifacts and metrics like service level agreements, pricing tiers, MRR, churn, those things become really important and that's what differentiates SaaS businesses from traditional softwares. An AI agent, thus has to operate within those bounds. If for example, if an agent is displaying high latency behavior, that will directly and negatively impact the SLAs that SaaS provider has committed to its customer. You know? Similarly, if an agent is mixing variable compute on a fixed subscription, that will again negatively impact the unit economics of the SaaS itself.
So I would say just to sum that up, considering both business and technical considerations, latency, cost, customer satisfaction, data privacy, tenant isolation. You know, these are some of the core traits of what an AI agent has to deliver at at a very high level in order to be instrumental when it comes to, being a core feature of AI SaaS business products.
[00:05:55] Tobias Macey:
If I can characterize the overall space of SaaS, which is a very broad range, I would say that some of the key elements are economies of scale and the ability to rapidly deploy changes without having to wait for the distribution of physical media that was the hallmark of the nineties and early two thousands where you had to wait until you got a new CD to install the latest version of whatever the system is and being more subscription based versus a flat fee for purchase of a physical medium. And with the fact that these businesses do focus a lot on that unit economics of delivery for that software, how does the introduction of agentic capabilities and agentic use cases change those ROI calculations, particularly given the highly variable nature of a lot of these generative AI systems?
[00:06:50] Preeti Shukla:
Absolutely. That's a great question. And I'm gonna start by saying that at the intersection of AI agents and SaaS, we are seeing very interesting pricing already. You know, SaaS was long known for seed based pricing that gave SaaS providers very predictable revenue. What has happened off late is we are seeing variations on those pricing, outcome based pricing, credit based pricing. These are really AI agent introduction or I would say, like, transformation of the traditional SaaS pricing. And, you know, it is necessary. You know, if we think from a business ROI and if I put it very simply, the total cost of all the features and or workflows that a SaaS provider is providing to a customer should not exceed the subscription price that customer is paying to the SaaS provider.
And it was kind of simple, know, relatively simpler to price that model and to get that predictable revenue on a monthly and an annual basis. But now what has happened, is the point I was alluding to before. There is a lot of mixing between variable compute. So if your workflow really involves moral goals, which they almost always do, you know, so now we are talking about a variable compute and we have to be very careful when we do cost estimation and to ensure that agents are not out costing the unit economics itself.
[00:08:23] Tobias Macey:
Another factor of relying on model providers specifically is that it brings with it a certain amount of platform risk where you're depending on the capabilities of that system to deliver on its functionality for you to be able to serve your customers. And so particularly large and technically forward companies will maybe opt to self host some of those models or for purposes of being able to bring it in for fine tuning and drive some of those costs down or improve the predictability of cost. And I'm wondering what you're seeing as some of the tactical elements of how companies are thinking about that question of which models should I be using, which providers should I be relying on, and how do I develop some of that confidence in the capabilities of that model, particularly if you're using one of the hosted providers such as Anthropic, OpenAI, Google, where the actual model might change under the covers without you knowing it because they're not necessarily going to advertise every incremental update?
[00:09:30] Preeti Shukla:
I guess there are two ways I look at it. So there are two frameworks that can be applied. One is most widespread is segmenting the market based on what kind of SaaS businesses they are, and that roughly correlates to the kind of use cases they have. So I'm talking in terms of b to c SaaS, b to b SaaS, enterprise SaaS. So, you know, and that roughly correlates to b to c SaaS having simpler workflows, mostly Geni dominated. And, of course, there are exceptions. But but in this really, you know, like, easier to understand framing really correlate b to b and enterprise SaaS with more complexity.
So b to c SaaS at the lower spectrum, more generally dominated where front end models do great. Front end models will continue to do great on those. There is not really a lot of requirement to fine tune those, you know, prompt, a well written production prompt with a few short learning, many short learning, those kind of things still help. But when we are talking about b to b SaaS and enterprise SaaS, that's where complexity comes. You are talking about multi agent, complicated orchestration, more token utilization that really skyrockets the cost, governance and compliance. You know? And and this complex these complex workflows and scenarios, they really call for more sophisticated strategies on which models to use, whether to self host, and usually, it is use case driven. If your workflow requires anything with respect to natural language processing, still front end models do great, but there are strategy there are techniques to use cheaper models for classification or for labeling. So there is capability routing based on the capability of the model itself. There are other techniques like RAG, such that your prompt doesn't become too bloated. You you can ground your agentic behavior into enterprise documents. So I think those are some of the strategies.
And observability and evaluation strategies, they continue. I guess we are going to hear more and more about them. I'm already hearing a lot in tech circles, conversations, you know, that I'm part of in big conferences. Everybody is talking about evaluations and guardrails. Irrespective of the workflow complexity for any production grade software observability platforms are playing a big instrumental role in helping providers navigate these challenges. And real real quick, you know, I I said there are two frameworks. So the other framework that I will quickly talk about is irrespective of the market segmentation, what is the complexity of automation itself? And we can think about rule based automation, the the simple automation.
And then the next is intelligent automation. So now give your automation some brains, you know, give that LLM, some memory, some tools, and, you know, that's that is going to deliver a lot of value over a rule based automation. And then really the third, frontier is self autonomous agents, which as all of us know, they are yet to be seen deployed widely on production. So really the sweet spot is b to b SaaS medium complexity intelligent automation that most production agents are capable of.
[00:12:50] Tobias Macey:
And there's also a wide degree of variability in terms of the presentation of those agents where some of them might be something where it's a chat focused interaction where I, as a customer of the SaaS platform, am going to go and maybe there's some text input or some document upload capability that will then trigger a workflow versus an agentic capability that is entirely invisible to me as a customer, but facilitates the operation of the platform and maybe power some of the actual feature set that I am interacting with, but I don't know that it's necessarily an agentic loop that's running in the background to perform those activities. And I'm wondering how you're seeing SaaS systems determine what are the integration points and exposure points that are most useful for agentic use cases and whether and when to expose that to the customer versus keeping it internal and some of the trade offs of those decisions as far as the appearance of being sophisticated and capable versus preventing the customer from getting stuck in the heightened expectations of, oh, well, they have an AI, so that means they can do anything.
[00:14:07] Preeti Shukla:
Absolutely. I would say that the most successful transformations I've seen are that start with internal adoption first and then moving it on to making them customer facing features, you know, as as we serve them in a SaaS bundle. But I think companies that are embracing AI culture first, they also adopt AI the first and that kind of brings everybody on the board, you know, leadership, sponsors, teams, leaders. And when they have really worked, especially especially with AI agents, they very well know the limitations of it, you know. And I think that helps them empathize when we will be ready to really make it into a customer facing feature.
So I think that's really the first step that I've seen. First internal adoption happens and then when customer adoption happens. And even when it comes to customer adoption even internal adoption for that matter, there are some low hanging fruits, you know, as as we were talking about. Again, going back to the complexity or using a b to c kind of simpler side of the spectrum chatbot, know, maybe one one agent is, good enough because it's just one workflow that you need and it's a simple workflow. And then, you know, one agent with with good enough capabilities with some simple memory management and having tool access like Gmail, Gcal, a notetaker is good enough.
But when it again comes to more complex use cases, that's where my recommendation is that it's good to start with a graduated autonomy, you know, when it comes to rolling out AI agentic features. Use cases that are well known and documented use cases that have actually proven ROI on the production. Those are the ones we start with, be it internal or be it SaaS. And they are usually in level one and two automation, like intelligent automations. But and then gradually moving it on to to, like, the more complex use cases where you might need more r and d focus. I'll also, you know, take take this moment to something that I hear a lot, what is the perfect use use case for When not to use AI agent? Because we are kind of in discussing that. So I think one thing is, like, even before we get into what are the use cases of AI agent, I think we need to do a binary check whether we we need AI agents for that or not. So unless your workflow is really multi step, repetitive, needs autonomy, and has a clear goal, you know, that's where an AI agent consideration makes sense.
There are some examples where your classical there are some use cases where your classical ML will continue to do just fine. You know, if you're talking about prediction, time time series forecasting, classical ML will continue to do well there. If you're talking about a Geni natural language processing like extraction summarization and that doesn't have any complex workflow just to that. Of course, you know, Geni capabilities continue to work fine. And then really when it comes to multiple flow, multi step, repetitive steps where you need autonomy, that's where you really consider AI agents. And once you establish that, then there are various use cases on what models to use, what what use cases to make.
[00:17:33] Tobias Macey:
For companies that are incorporating some of these AI focused use cases, particularly with these generative AI platforms, what are some of the ways that you are seeing them address some of that confidence building where you mentioned they might start as an internal proof of concept. But what are the signals that give them the what are the signals that let them know that this is ready to actually put in front of customers and expose and advertise around and some of the ways that they need to change some of their operational maturity around things like testing and evaluation beyond just the ability to run a software system because of the fact that these AI platforms are so dramatically different from traditional software of I can run a web application and keep it stable and scalable.
Now I need to make sure that it's actually able to sustain the load of nonhuman entities interacting with it.
[00:18:32] Preeti Shukla:
Yes. Absolutely. So the challenge is really AI agents in a SaaS context are expected to display deterministic behavior while they are, you know, like, undeterministic. And I think one thing there are a few things that a team have to do groundwork upon. Will this SaaS flow respect the latency? Will will this flow respect the unit economics and most importantly, the technical aspect of reliability and predictability? And and there are certain strategies as we talk evaluation, you know, so what I'm seeing is that teams are really experimenting with a layered evaluation strategy approach. So it's not just one score or it's not just one test that you embed it embed in your CICD pipeline.
But there is whole evaluation engineering. And, like, to to, you know, like, to be more specific about that, there are deterministic checkpoints as part of evaluation at the base level, schema validation, policy validation. Really the cheap ones, low cost test. And then moving on to more advanced outcome based test, GOLDEN datasets. So these are the standard datasets that will help catch any regressions, you know, because moral changes, workflow changes, but these are standards that will always ensure that you are getting same expected output over a period of time. One more strategy I've seen is when fuzzy logic is involved, LLM as a judge. So let's say you have a chat bot and, you have to monitor what's the performance of it or when it is safe to rule out to customers. And I'm I'm assuming that all of all of this is done in pilot first. You know. So you have a POC first, but then you have a pilot with one or two beta tenant before you really roll it out to production. So when it's still in pilot phase where you have access to some real tenanted data, you know, these are the evaluations and checks. So these LLMS judges are really your confidence scoring.
And when your confidence score drops below a certain percentage, like, it's not 80% anymore, that's when you bring human in the loop. You know? So I guess there are various layered evaluation strategies. There are also techniques that run that talk about running offline evaluations in the CICD pipeline, but also more runtime evaluations while the customer is using production. And again, observability platform, play a big role. So I would say really the decision comes down to when is your AI pilot implementation passing your cutoffs or benchmarks that you have set via these evals and guardrails.
[00:21:16] Tobias Macey:
And build versus buy is the perennial challenge in technology. I'm curious what your thoughts are on that equation for teams that are planning on implementing an agentic capability in their SaaS platform. Should they just buy an off the shelf system that handles a lot of the rag and evaluation and LLM serving and load balancing around that so that they can just plug in their system and get some of the out of the box behaviors or because of the fact that maybe it is a business critical capability, is that worthwhile having your own team own the end to end capability and maybe what are the gradations along that continuum?
[00:21:59] Preeti Shukla:
Yeah. Absolutely. I think the debate has been there since long, even pre AI, you know, whether you build or or you buy a product. And I think I kind of agree to how you framed your question and, you know, that had kind of answer to that as well. It all comes down to business capability. You know, if this is something that is truly tied to ROI of your SaaS business, that is core part of your business or core workflow of your business, that's where you should build it. And something which is infrastructure related, something which is peripheral in nature like a vector database. There are so much work happening. There are cloud providers also that continue to offer newer AI infrastructure that will help SaaS softwares, improve their capabilities over a period of time. So, you know, so I think that's the general guidance that still holds true in my opinion. So, you know, it comes down to what is the business capability. Now I will also say that there are some exceptions to it, you know, and, we are talking about enterprises that work with federal data or government data, and they have to work in more stricter, like more stringent compliance laws, regulations laws, data residency, or they have a military contract. So they have to operate under certain constraints that might kind of make building a better choice for them. But but for most of us, for for the rest of us, that guidance on build versus buy still hold true.
[00:23:30] Tobias Macey:
And then the other piece of this too is that AI systems are very data hungry, and that requires a separate set of technical and operational maturity for being able to manage a lot of those data flows. And many businesses, particularly if they're on the smaller end of the spectrum, aren't necessarily going to invest in headcount for data engineers and AI engineers in addition to the software engineers that are focused on the core product delivery. And I'm wondering how you're seeing teams address some of that increased operational maturity requirement to be able to provide all of the data and context to these AI systems to make sure that they're able to operate effectively and have appropriate grounding for the operations that they're providing?
[00:24:25] Preeti Shukla:
Absolutely. So data is the real mode in AI era. You know? I think we have learned this that the data continue. And this also comes down to which company is going to run with a successful AI product. As they say garbage in garbage out, you know. So if if teams don't have access to write, sanitize, accurate data, that means they have to do work on the data pipeline first because AI doesn't really fix workflows that are already broken, you know, or workflows that don't operate with good data. So I think that's definitely there. If companies have access, so there are various startups, you know, so I'm gonna first talk about startups because that might be more challenging for for startups who are trying who are new AI comers. They might have access to data. In some cases, they don't have to data. So either they're operating as an AI infra startup or they're operating as a SaaS application, you know, where they are using existing frontier models for their use cases.
So I would say that the way AI infra providers have enabled startups during the cloud era, they will continue to do that. But again, it comes down to individual use cases. For a very small startup for which they really don't need to own AI infra and they can operate off, let's say, data provided by Apple watches. Let's say they are a consumer apps. So I think they have to abide by data privacy laws but I think they can still work with this big ecosystem where there are enablers, you know, like AI, in fact, providers are big four companies that have access to the data. And then the second type we are talking about is companies that themselves are, like an AI data platform company, you know. So that's where they have to do more rigor on making sure that they have the right resources to sanitize the data, get the right team structure for promoting that AI data before they can really incorporate that as an AI product.
[00:26:39] Tobias Macey:
And then on as a corollary to that data requirement is the challenge of the I've seen it phrased as the confident idiot where the AI will very certainly give you an answer that is unequivocally wrong. But if it passes it off with enough confidence, then people will accept that as wrong, and that can lead to a lot of problems. And what are some of the ways that you're seeing teams try to address that challenge and identify some of these confident falsehoods and steer the AIs in the appropriate direction. And because of the fact that that is a never ending battle, what are some of the systems that they need to have in place to be able to identify and, correct those decision making steps that the AI might lead you down the wrong path towards?
[00:27:33] Preeti Shukla:
Absolutely. And I would say, like, those are the most dangerous ones, you know, the silent failures or the most confident failures that just cash teams off guard. What I've seen working is that there is definitely the implementation part of that and there's a validation part of that and then monitoring. So if you really break break down, into implementation versus validation versus monitoring then there are certain tried and tested strategies that work. I think we can start with the basis like a good well written production grade prompt. I think that at the base level ensures that your AI solution doesn't hallucinate, breaking down structures, breaking down prompt still works. I think that that's still a very effective strategy.
What I encourage my engineers to do is and I do it myself. It's like go go read production level prompts. And I think it it needs some training because sometimes to ask appears very simple. You know? Like, summarize this data. But when you really think through that and when you start observing agentic behavior or LLM behavior, you will start to see a pattern of the mistake it does, you know, and and then you might want to give it a structure, summarize this in four sections. Talk about an introduction, then talk about a hook, and then end with the conclusion. So I think I highly encourage that. It's a low hanging fruit. It still works effectively.
Second, I would say is more for b to b SaaS and enterprise grade software. Rag is number one. I I still believe it to be number one most effective technique when it comes to grounding AI output into truth. So you can have your enterprise policy documents, you can have your HR policy documents, customer success. It could also be something very specific to your domain, like a medical chatbot or like a legal chatbot. I think they can definitely be given loads of information that that can be fed. So I think these two or three strategies really help minimize hallucination. Now we all know that that's not enough at this point of time, you know, because AI solutions are still improving and that's where validation comes into the picture, you know, and I guess all the evaluation strategies that I was sharing with you before, they they play a crucial role.
And as as I was saying, there are various sophisticated ways of grading an output. You know? You can use LLM as a judge. You can use golden datasets. You can also do path evaluation. I I think this one specifically because you set the confident ones. So, you know, the confident ones and the most tricky ones to capture are the ones that slowly improve the latency. They are those agents or those AI solutions are doing multiple retries. You know? And they might end up giving you the right answer, but only in certain production cases you would see that. So I guess past level evaluation is where you really see not just the answer, but the behavior of the AI agent.
So I think that is also something that I found very effective. And then really the third part is the monitoring part, you know, where observability platform. If you're talking in context of SaaS, it's really a cost controlling per tenant or monitoring per tenant. I would also say that stepwise budgeting or stepwise monitoring is because you're also looking at the whole, but you also want to look at the sum of all parts or, like, parts of all sums.
[00:31:06] Tobias Macey:
And I think that's another element of the decision for whether you use a managed API only service for the model versus run it yourself is because of the ability to expose and act upon the log probability scores of the outputs to be able to see, oh, well, the model phrased this very confidently, but the log probability is actually fairly low. So I'm going to take a second pass to evaluate this output versus if it's an API response, you lose that insight into the inner workings of the model to determine what is its confidence score.
And I'm wondering how you're seeing teams think about that as part of the overall equation of where and how to run these models and how to hook into some of those confidence metrics that the model itself generates?
[00:31:54] Preeti Shukla:
Absolutely. So one thing that I find more effective is monitoring at the model level, but also at the workflow level. Because what you're monitoring at the model level is just one part of the workflow. So in this model, so 5.1 was giving me this particular summary and this is the prompt that worked with 5.1. But after 5.2, the same prompt doesn't work. So nothing has changed, but but still things have changed drastically, you know, like those kind of errors. So, you know, so I think when we monitor at the model level and when you write evaluations, you basically write it at the unit level. So I think of moral level validation as unit level test and workflow level test as more end to end system test or integration test. So I think that's a very rough analogy. But I think definitely monitoring morals output and behavior at an individual step level, but then also seeing it holistically that this moral output was supposed to return me summary such that next agent or this tool was supposed to pass it to this next API.
You know? So I guess that's the whole flow where evals can really help you understand what part of the equation broke, why, and what kind of effect model changes have. Do we need to version our models? So I think that is also one more effective strategy that teams are becoming more and more aware that more upgrade may or may not help, you know. So I think by default upgrading tomorrow might not be a good strategy until you have really solid task. So I think definitely running the model changes in a sandbox environment first using things like Android deployment to slowly graduate the new model to your existing code code base just to do do this in a very controlled and and in a very graduated fashion.
[00:33:54] Tobias Macey:
Going back to the multi tenant nature of these SaaS platforms, that also brings up the question of attribution and identity management for these autonomous actors. And I'm wondering how you're seeing the platform owners of these SaaS systems think about the types of additional security controls and identity frameworks that they need to have in place to be able to allow these agents to operate on behalf of the various customers and manage their either isolation within tenant boundaries or how to think about whether and when they can operate across tenant boundaries because of the fact that one of the core promises of these SaaS systems is that each tenant will only ever be able to access their own resources and never be able to span across them.
[00:34:44] Preeti Shukla:
Yes. Absolutely. I think some of the existing techniques AI agents have to abide by and some of the newer challenges that that come with AI agent themselves. So so when I think about existing techniques, I would say that tenant isolation is something that continues to be supported by a successful AI agent. And the tenant isolation we are talking about is through the entire workflow of the AI agent. You know? So think in terms of memory, there should be a tenant isolation at the memory level, at the two level, at the database level, at the vector database level. So I think an agent should always be able to track on whose tenant the agent is taking an action because this is also needed for audit purposes.
In most of the SaaS businesses, you need to be able to see who ran what, which tenant ran what in their isolated environment, what was the role agent had because an agent doesn't have an identity of itself and it runs for SaaS. It it is run for a tenant and for a user. Another technique is definitely about RBAC role, you know, role based access control that also continues to be highly relevant in in context of AI agent still. An AI agent should only be able to run the feature set or responsibilities that the role that has been given to it allows and not outside of this. So I think at a high level, think the tenant scoping and, you know, at the middle layer, I see the role scoping and really at the bottom layer is the user scoping. So I think that that framework still holds true. Some of the newer challenges with AI agents are of that of interoperability, and I would think MCP because an agent is calling tools, but I think we really wanna make sure that there are enough guardrails in place such that accidentally an AI agent doesn't get access to across tenanted information while it's calling tools. So I think tool boundaries are something to really watch out for and that's why guardrails become very, you know, like effective in terms of who is the agent calling, what tools are they calling and are they still maintaining the scope.
[00:37:05] Tobias Macey:
And in your work of building these SaaS platforms, incorporating agented capabilities, working with other companies that are going through that same journey, what are some of the most interesting or innovative or unexpected ways that you've seen these agented capabilities incorporated into SaaS products?
[00:37:23] Preeti Shukla:
Absolutely. I mean, I think it goes without saying that the most innovative ones I've seen are not the most autonomous ones. So I not seeing that teams are really deriving value by making them truly autonomous. There is almost always a human in the loop layer for critical business use cases. There is confidence scoring. I think those things are there for sure. Some of the interesting ways I have seen is, you know, like one particular implementation I've seen where the entire multi agent framework is deployed on an on a MCP server. And that multi agent solution has become client agnostics.
And that back end is via various front end, you know, so the product has its own front end, but the product, the backend product can also be served via standard MCP client like Claude. So I think that was very interesting to me, you know, because we have always talked about decoupling UI from the back end. But I think AI agents, especially the MCP layer is facilitating that in very interesting ways. So using a standard MCP client like Claude and others, you can basically interact. This was a chatbot. This was a salon based product. And with the standard Claude UI, you can actually talk to the AI agent, which is basically master orchestrated multi agent orchestration system, and you can basically ask where is the salon opening, when is the salon closing, and all sorts of things, which I think was very cool and very effective. Some of the other things I've seen and I'm sure, you know, like, these are not as unique, but I feel they're very effective is agents as code. So in order to really use and really make data analyst life easy, agents generate code on the fly using Panda libraries, using Python, and then you can really analyze data on the fly. What I like about these agents is that because it's non deterministic code, it is not good for shipping production unless you really harness that with Evasse, Godreel but it makes for a very useful use case when it comes to just analyzing data. So so I see this code is like using through code, but nonetheless very valuable. You know? And then one last implementation I would talk about is that we know that b two b SaaS and enterprise are mostly experimenting and also deploying to some extent multi agent systems, but these are very complex and these are usually high latency, high cost solutions.
The more intelligent an AI agent is, the more costly and high latency behavior that shows. So one of the workaround I've seen is that where some teams have used agents as tools. So they have just one planner agent which has access to multiple tools, but one of the tools or maybe multiple tools are agents. So this way you avoid the complexity of a multi agent system, yet you can capture the value of a multi agent systems. So I think that was really interesting and cool.
[00:40:33] Tobias Macey:
And in your experience of working in this space and helping to bring helping to bring agentic features to SaaS platforms, What are some of the most interesting or unexpected or challenging lessons that you've learned personally?
[00:40:45] Preeti Shukla:
I think moral changes, they fail silently. I think this is something that most of us have felt the pain of it. So I think this is something that we want to upgrade to new models because new models become with great capabilities, you know, when it comes to frontier models. But it is still very brittle and, you know, like one small model tweak. This is the scenario that I shared with you where nothing changes, but everything changes for bad because your workflow is still the same. But, like, you went from five to 5.1, 5.1 to 5.2, and your prompt is not working anymore. Your your RAG instructions, they need to be fine tuned more. So I think that is keeps on coming. And really the strategy is relying heavily on evals and also coming up with a policy on when to deploy newer models. And I think there is a whole CICD paradigm about it. You know? So I guess that's something that I found interesting.
I would also say that one thing that has taken even great teams aback is the silent retries that I also mentioned in one of the previous discussions we were having. This is the behavior when the output looks correct. It's it's not in functionality of the agent you are able to detect problem, but it's in the behavior of the agent. And by that, I mean, did it really take the shortest path to answering your question, or did it really go to a lot of retries, lot of to and fro. You know? So I guess just path level monitoring of an agent is what I feel is the trickiest because in some small use cases on pilot, you won't catch it because your evals are just looking at the right data functionally and it is giving you that. But it really comes down to production where suddenly you see unexplained latency. And I think that's why path based evaluations are really the most advanced and more sophisticated, which can prevent such, you know, like surprises on production.
[00:42:45] Tobias Macey:
And we touched on this a few times, but what are the situations where you would strongly advise against the adoption of agentic behaviors for a SaaS platform?
[00:42:56] Preeti Shukla:
Yeah. I mean, as as we were talking about that, you know I will stress on that AI agents are non deterministic yet in a SaaS context they are expected to display deterministic behavior and it needs a lot of harnessing, lot of discipline around that. It is doable, you know, and teams are successfully doing that, but I don't think we should overengineer. So as we were talking about, there are some use cases, some very standard use cases where Geni, predictive AI, your classical ML, you know, they will continue to make a lot of sense, and they will completely deliver on the expected ROI. For AI agent, there has to be really use case where we are talking about multistep, repetitive step. They should be very action oriented.
If you wanna explore something, you know, or you want a creative output, that's where Gen I may be a better solution. So I think those still continue to be my guidelines. So before we start on any agentic eruption, we should really map our AI ecosystem and really see for ourselves like we have this complex AI ecosystem where this will be solved by classical ML, this will be solved by Gen I. And these are the opportunities where I think we can adopt AI agents.
[00:44:14] Tobias Macey:
And as the capabilities of these models continue to improve, as we gain more established patterns and practices around how to build agentic systems and how to manage them effectively in production contexts. What are some of the predictions that you have as far as the role that these agentic systems are going to play in the future of SaaS products, or are SaaS products merely going to fade away in favor of agentic only capabilities?
[00:44:44] Preeti Shukla:
Yeah. I'm aware that this is a hotly debated topic over the LinkedIn, you know, and I guess various LinkedIn influencers have opined on this and there's a spectrum of I would say there are different schools of thought, you know, so I guess one one school of thought is that thought is that sass is that, you know, and other school of thought is that no sass is here to stay. And really the third school of thought, which is my own thought is that I think in the short term future, and I'm talking about next two or three years, we are going to see SaaS that has AI agent strongly embedded. So both Genai and agent tech AI strongly embedded in the workflows. So I think that's how I see the industry unfold. And the reason is simple. See, the reason why I think SaaS it's it's very hard to completely take over SaaS is that the moral of the SaaS is that you have to deliver predictability, reliability, you know, in a certain repetitive fashion. And AI being nondeterministic makes it very hard even now with all the infrastructure and all the support and all the wisdom that we have learned after navigating this, you know, like these products for a few years is not at a place where you can really take over the deterministic form of SaaS. So the way I see it is a very strong hybrid model and first and foremost we will see impact on the pricing model itself.
So you know and we kind of talked about that but like one of the disruption I see which is already happening is that SaaS pricing has become more complex and more layered. So if I look at products like Lovable, you know, or similar products, there is a SaaS seat pricing plus you have credit usage as well. Or I think that will further come down to outcome based pricing because credit usage is not driving a lot of customer satisfaction because you have used your credit but you have to tweak the prompt so many times that you lost all your credit. So I think that's really one disruption I see. Second disruption I see is AI agents playing a significant role in customer onboarding, which is one of the core flow. So so I see a lot of agentic onboarding, and I'm really excited about that because traditional SaaS boarding was very template heavy.
Use you install the app. They will ask you a bunch of questions. You have to toggle. Yes. No. Yes. No. I really feel that with AI agenting and general AI capabilities, we are talking about a seamless, let's say, voice onboarding, you know, or a text onboarding where you where AI is intelligent enough to kind of extract your configuration and you and you don't have to go with that, like, twenty ten, twenty, thirty step five on onboarding that we are so used to in in SaaS world. So I feel in nutshell that AI and AI agents in particular are going to empower SaaS by taking away the workflow complexities, making UI more visually appealing, less template heavy, less configuration heavy, and with an interesting mix of pricing models that we are already seeing.
[00:47:48] Tobias Macey:
Are there any other aspects of bringing Agenta capabilities into SaaS systems, the overall economy of SaaS and the impacts of AI on that, or just your own experience of working in this space that we didn't discuss yet that you'd like to cover before we close out the show?
[00:48:05] Preeti Shukla:
I mean, I think we have talked about most of it. Thanks for the great questions and thoughtful questions that you have asked me but I think one thing I would like to say is that I think there are still some gaps in the industry you know and I guess there are two or three things which will help our industry as a whole that make AI adoption easier and successful. It's not a good place to be where leading research studies are publishing that ninety five percent of Gen I pilots fail or by 2027 fifty fifty percent of it of agent AI pilots will fail. I think what we need is there are ways to make agent work on complex production environment, but it needs a lot of plumbing. It needs a lot of discipline and all the strategies that we have talked about. And I'm hoping that more and more platforms will come which will productize this approach. It doesn't have to live in minds of few leaders or few researchers who have figured this out. So I'm already seeing a lot of job made easy by observability platforms where you can track costs, where you can track latency, but estimation still continues to be a challenge. You know, we talk a little about how do we estimate, you know, because that's the first step. And I think if there are more and more products that can make it easier for leaders to estimate and to ensure that this project is going to to return the ROI that business expects. I think that that will go a long way. So I think that is definitely something. I would also say that interoperability between agent is still being figured out. Now there are new protocols, you know, like a two a is not new anymore, but there there is a payment protocol as well that has come. But I think a lot of work needs to be done, and it will happen, you know, because things are so new that that we are still living in a relatively nascent world when it comes to interoperability that is going to really enable that enterprise level success of AI agent agentic solutions on production. So I think those are definitely two things where I guess one last thing is when it comes to scalability because if you're talking about production and enterprise scale we have to think about what kind of AI infra exist And I guess one thing that is becoming very apparent is that our regular AI infra tools to scale auto scalings, they are not very friendly towards AI native solutions. I mean, you can of course use a Lambda to scale. You can of course containerize and put your AI agents in Dockers and deploy that on EC2 with auto scaling. AI agents need a lot of memory, you know, and in most situations we see time out. So I think one more improvement that I'm hoping to see in the AI infra industry is that ability to have AI agents scale more more seamlessly. So I think those are the top two or three things that are in my mind that still need to be solved for. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gaps in the tooling technology or human training that's available for AI systems today.
Absolutely. So I think I kind of talked about the gaps and challenges in the previous discussion, but I think human training, yes. I think this also goes back to why I feel that SaaS cannot be completely overtaken by agents is because there is a real lack of resources. There is real lack of talent when it comes to leadership or even engineers who are navigating this transition. And it is ironic, you know, that there are so many AI native startups in past two years, but there is not enough talent to supply that because there are limited people with research background.
And if you're really looking for industry experts on this, I think we need to do more trainings. And I know that there are trainings provided by Deep Learning, Coursera, Udemy, Maven. You know, I'm a big part of those communities as well and I definitely keep myself updated, but I'm hoping that training becomes more democratized because that's what we need. Because most of the startups I talked to, they are trying to figure out they have a prototype, but they don't have the confidence to take it to production. And that shows me that there's a real gap on production grade skills. And I'm hoping that there are more and more courses, more and more leaders there often mentoring to be leaders and engineers who can really navigate the shift.
[00:52:43] Tobias Macey:
Alright. Well, thank you very much for taking the time today to join me and share the work that you've been doing on this challenge of bringing agent to capabilities into SaaS platforms and the economics and technical challenges associated with that. So I appreciate all the time and energy you're putting into helping other companies address that challenge and modernize at the rapid pace that things are changing. So thank you again for that, and I hope you enjoy the rest of your day.
[00:53:11] Preeti Shukla:
Thank you, Tobias, so much, and that was such a great conversation.
[00:53:17] Tobias Macey:
Thank you for listening. Don't forget to check out our other shows. The data engineering podcast covers the latest on modern data management, and podcast.init covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts@aiengineeringpodcast.com with your story.
Introductions: Tobias Maci interviews Preeti Shukla
Preetis background: from enterprise SaaS to agentic products
How AI agents change SaaS requirements: latency, cost, privacy
ROI and pricing models for agentic capabilities
Choosing models and providers: risk, routing, and RAG
Automation spectrum: rules, intelligent automation, autonomy
Where agents live: UX surface vs. invisible backend
Graduated autonomy and when not to use agents
From POC to customer rollout: evals, pilots, guardrails
Build vs. buy for agentic SaaS capabilities
Data readiness and pipelines for AI in SaaS
Combating confident errors: prompts, RAG, validation
Model confidence, monitoring, and versioning strategy
Multitenancy, RBAC, and agent identity controls
Innovative patterns: MCP, agents-as-code, agents-as-tools
Lessons learned: silent model failures and path latency
When to avoid agents: pick the right AI for the job
Future of SaaS: hybrid products with embedded agents
Industry gaps: productized plumbing, interoperability, scaling
Biggest gaps today: tooling and human training
Closing remarks and credits