In this episode of the AI Engineering Podcast Lucas Thelosen and Drew Gillson talk about Orion, their agentic analytics platform that delivers proactive, push-based insights to business users through asynchronous thinking with rich organizational context. Lucas and Drew share their approach to building trustworthy analysis by grounding in semantic layers, fact tables, and quality-assurance loops, as well as their focus on accuracy through parallel test-time compute and evolving from probabilistic steps to deterministic tools. They discuss the importance of context engineering, multi-agent orchestration, and security boundaries for enterprise deployments, and share lessons learned on consistency, tool design, user change management, and the emerging role of "AI manager" as a career path. The conversation highlights the future of AI knowledge workers collaborating across organizations and tools while simplifying UIs and raising the bar on actionable, trustworthy analytics.
Announcements
- Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systems
- When ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.
- Your host is Tobias Macey and today I'm interviewing Lucas Thelosen and Drew Gillson about their experiences building an agentic analytics platform and the challenges of ensuring accuracy to build trust
- Introduction
- How did you get involved in machine learning?
- Can you describe what Orion is and the story behind it?
- Business analytics is a field that requires a high degree of accuracy and detail because of the potential for substantial impact on the business (positive and negative). These are areas that generative AI has struggled with achieving consistently. What was your process for building confidence in your ability to achieve that threshold before committing to the path you are on now?
- There are numerous ways that generative AI can be incorporated into the process of designing, building, and delivering analytical insights. How would you characterize the different strategies that data teams and vendors have approached that problem?
- What do you see as the organizational benefits of moving to a push-based model for analytics?
- Can you describe the system architecture of Orion?
- Agentic design patterns are still in the early days of being developed and proven out. Can you give a breakdown of the approach that you are using?
- How do you think about the responsibility boundaries, communication paths, temporal patterns, etc. across the different agents?
- Tool use is a key component of agentic architectures. What is your process for identifying, developing, validating, and securing the tools that you provide to your agents?
- What are the boundaries and extension points that you see when building agentic systems?
- What are the opportunities for using e.g. A2A for protocol for managing agentic hand-offs?
- What is your process for managing the experimentation loop for changes to your models, data, prompts, etc. as you iterate on your product?
- What are some of the ways that you are using the agents that power your system to identify and act on opportunities for self-improvement?
- What are the most interesting, innovative, or unexpected ways that you have seen Orion used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Orion?
- When is an agentic approach the wrong choice?
- What do you have planned for the future of Orion?
Parting Question
- From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?
- Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers.
- Gravity
- Orion
- Site Reliability Engineering
- Anthropic Claude Sonnet 4.5
- A2A (Agent2Agent) Protocol
- Simon Willison
- Behavioral Science
- Grounded Theory
- LLM as a Judge
- RLHF == Reinforcement Learning from Human Feedback
Hello, and welcome to the AI Engineering podcast, your guide to the fast moving world of building scalable and maintainable AI systems. When ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models. They needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App relies on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated.
Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows, but Prefect didn't stop there. They just launched FastMCP, production ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing fast Python execution. Deploy your AI tools once. Connect to Claude, Cursor, or any MCP client. No more building off flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure.
See what Prefect and FastMCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.
[00:01:28] Tobias Macey:
Your host is Tobias Macey, and today I'm interviewing Lukas Toulozan and Drew Gilson about their experiences building an agentic analytics platform and the challenges of ensuring accuracy to build trust. So, Lukas, can you start by introducing yourself?
[00:01:41] Lucas Thelosen:
Absolutely. Yeah. Thanks for having us here. I'm Lucas, the CEO of Gravity, and we build Orion, an AI analyst. So Drew and I both came from the Looka days. Looka is a BI tool that Google acquired in 2020, and then we get to spend some years at at Google where we both branched into, beyond Looka. Drew went over to the AI team. I focused on data. None of us is still data analyst at heart. I'm originally from Germany as you can maybe tell by my accents, but twenty years now in Colorado where I raised my four daughters.
[00:02:13] Tobias Macey:
And I'm just gonna take a side note to highlight the fact that the selection of your company name allows you to come off as very important sounding. You say, I'm in I'm in charge of gravity.
[00:02:25] Lucas Thelosen:
Yeah. Yeah. Exactly. Well, we stick you know, let's stick to the very, very basics of physics here. Yep. Fundamental. I think, you know, one of you know, a little side tension, but, like, when we can figure out gravity, like, truly figured out since Newton, you know, then we really have made a major advancement in intelligence. And, so that's the long story of how I thought about gravity and AI.
[00:02:48] Tobias Macey:
So AGI just stands for artificial gravity intelligence.
[00:02:52] Lucas Thelosen:
There you go.
[00:02:53] Tobias Macey:
Alright. And, Drew, how about yourself?
[00:02:56] Drew Gilson:
Yeah. So I'm a Canadian, actually. I live in Calgary, Canada. And as Lucas said, we met as customers of Looker, so a long time ago. We've been working together now for over a decade, and I have always really been involved in data and AI for as long as I can remember. I was, trying to do, like, automatic optimizations of AdWords campaigns, like, probably as long ago as 2005 with various techniques and ended up running a business in the outdoor apparel category, so online retail for a period of time. And, the common thread through it all, of course, is data. So as Lucas said, spent a few years at Google, most recently on the cloud AI team. And, now Lucas and I have been operating gravity for eighteen months.
[00:03:42] Tobias Macey:
And so going back to you, Lucas, do you remember how you first got started working in the ML and AI space?
[00:03:49] Lucas Thelosen:
Yeah. So this is actually a real boiler room kind of story. So I was hired as a data analyst. Right? I've done the whole financial forecasting, track in my background, and I got hired again as an analyst at this company, small company, where the CEO was like, hey. We're in the middle of the mortgage crisis, but I have a connection to Countrywide, and I think I can get them as a customer, which, Countrywide was in bad, bad trouble back then. They got acquired by Bank of America shortly after I joined. But so we had them at a cuss as a customer, and, the CEO had no clue how to do anything. Like, he was a, you know, really corporate SVP at, at a big fintech company before this. Long story short, we had to figure out how to help people stay in their home so they don't foreclose on their mortgage. And, in essence, when you think about it, right, like, it's just a big if statement. And if this, then that. And so I went and wrote down, you know, what kind of variables would all go into what kind of mortgage modification they would qualify for and build this massive if tree. And so, and then I took the whenever the underwriter then decided for or against that decision, that mortgage recommendation, I took that learning back into the loop of, like, you know, then I need to adjust my if statements. And in essence, that that's, like, the the very foundation of, like, machine learning, with with a human in the loop. And we ended up processing, so we got Bank of America as a customer processing hundreds of thousands of mortgages a month. Through that system, we had three people whose only job it was to suck up the staples from the floor because with 300 people taking staples out of the paperwork so they could be scanned in for the machine to read it and then process, going through my algorithm process the, modification. So it was 2010, and then, you know, I always had data and AI together there. I built out an optimization engine for Uber when they were going international for their for their marketing.
And then I got to work with some of the larger retailers on supply chain optimization. So a lot of optimization scenarios after that, but it all started in that, you know, little tiny company in 2010 during the mortgage crisis.
[00:05:51] Tobias Macey:
And Drew, how did you first get started working in ML and AI?
[00:05:55] Drew Gilson:
Yeah. So for me, it was actually a hobby project. So when deep learning started to become very popular and talked about, so this was about 2016, 2017. I had always liked tinkering with different technologies, and I am really one of the most applied researchers that I think you'll find out there. There's almost nothing that I, you know, felt too intimidated to play with. And so when, computer vision models started to become very, let's just say, much more interesting than they had been, I thought, well, I gotta learn about this stuff stuff because clearly it's important. And so I did two things. I started a meetup and I started writing a little program. And what that turned into was actually a convolutional neural network that could recognize fridge magnets. And I was preparing to have my first child, and so there's a couple threads I'll weave together here. But I thought, how cool would it be if we could have technology that could teach kids to spell and read because now computers can see. And so, you know, is it possible truly for somebody like me to make that real? And actually, I was able to. So I trained my first deep neural network in 2017.
And so it was capable. It ran on my laptop. I can't remember how many layers it had. I mean, it took like, I remember training the thing. It took quite a long time, and my computer got very hot. But it worked. So you could put fridge magnets down on the table, and it would prompt you, you know, spell cat, spell dog, spell fish, that kind of thing. And then you'd move the letters around, and it would recognize it, and then it would give you points. So I wrote that in Python and trained the network and then talked about it at this meetup that I organized because I was just honestly, like, fascinated by it. Right? I thought this is just gonna change everything, that this is as accessible as it is. And then the meetup grew and grew and grew, so I held 34 events, one a month, over the kind of couple years, right up until COVID. And I actually when I stopped running the group just because of COVID, we had over 4,000 members and we'd had at least two and usually three machine learning focused presentations every month here in Calgary in Canada, where I live. So lots of very interesting energy optimization problems here in Calgary because the energy industry is big and you wanna be able to extract essentially as much energy as possible in an efficient way. And so there's a lot of very smart people working on those kinds of problems. So I just got to learn through osmosis as I hosted these people who are far smarter than me talking about the models that they've trained and the technology and the techniques that they were applying. And I can remember some moments like Attention is All You Need paper came out, I think it was in 2037.
And we talked about that paper, and then, of course, it seems like ancient history now. But, yeah, very fascinating. So these days, I'm still just as applied. You know, I don't have a I'm not a PhD, artificial intelligence researcher, but I do have enough hands on experience and now in both product and engineering. And through the career, my career at Google over the last couple years, I feel like I've got enough hands on exposure with a lot of different techniques and, and let's just say, different forms of AI that, I just feel very lucky to have been given that opportunity even though I wasn't sort of professionally trained for that. So I guess all that to say, I've just been playing with this stuff for a long time and have seen the potential for a while now. And so digging now into
[00:09:03] Tobias Macey:
Orion, which is the product that the two of you are investing your time, energy, and current, financial resources into. And I'll get a little bit of an overview about it from you, but I'll also point out that for folks listening to this episode that you were both on an episode of the data engineering podcast where we did a more thorough look into Orion and its view from the data analytics perspective, and I'll add a link in the show notes for that. But for the purposes of this conversation, if you could just give a quick overview about what it is and some of the story behind how it came to be, and in particular, how you developed the confidence that you would be able to bring a high enough degree of rigor and accuracy to the problem of data analytics using agentic techniques.
[00:09:50] Lucas Thelosen:
Oh, thank you. That's a a couple of things to unpack there. But it, you know, why did we quit our jobs at Google and then and then did this, and now Drew and I are both growing a lot more gray hair than than we had before. We're having more fun, though. There you go. So the first the first thing that was really clear to us working at Looker, you know, we ran the consulting practice for a long time there, being on the ground with people with our customers, is that you have data people that really understand the data, and then you have business people that understand the business that they're in, that they're functioning in the in the organization. But there's a big gap between the two. Right? And it's always a challenge of, like, well, what data do we have access to? Well, I don't know that, and so I don't know the questions I can ask. And and then you have you know, you try to solve this by having more analysts and have them closer to the teams that they're actually working with. So when, honestly, when ChatGPT came out, I felt like I finally had that piece that I always wanted to have where we can now put all these analytical best practices. Right? How do you do a cohort analysis? How do you do a root cause analysis? How do you do anomaly detection? And put that together with the understanding of your business, understanding of your role in the organization, and come up with the questions to ask for you. So Orion, very different than other tools out there, actually. We make it so it can think asynchronously. So it can think at night. It can think, you know, during the day about you and your role and the data in your organization. And being aware all the time about all the data available and being aware of all the users, all the different groups, departments, and use cases creates a really beautiful, you know, new world where as an analyst, I always wanted to be in all the meetings. Right? I wanted to be with the business, but they don't invite you all the time. And then when they do invite you, you're you have to create this big project right now, and you can't come. And that is no longer no longer the challenge right now. Like, even you can access meeting notes and all these additional contacts that you have now that wasn't really there two or three years ago. And having this rich context for Orion to take an understanding of you, your organization, and your department, and then all the understanding of your data is is is incredibly interesting opportunity.
Now one of the challenges, right, and and you you asked this earlier, like, you know, how do you handle accuracy? In the enterprise, we have to be a 100% accurate. And so as we as we left Google and decided to build this, we're like, the first thing we have to do is absolute accuracy. And I know absolute is a relative term. I have to put a footnote there, and we have to dive deep on that one. But in order to get that accuracy, we have to connect Orion to a ground source of truth. You know, a set of fact tables, Luca has a semantic layer that we use, Other tools, you know, like DBT are available that are there to create a set of golden records, so to speak, that Orion's quality assurance agents can actually use to refer back to and check if the data is accurate, accurate, if the insight is accurate, and actually run the loop a couple times. I mean, I think there's the other beauty of being asynchronously a lot of times is that Orion can create this, like, what I call deep analytics audio of research. So, you know, let's say you're an account manager and you, you know, you have customer meetings coming up all day long. Orion can actually then think during the night about all your upcoming meetings and numbers you need to pull for it so you're prepared for those meetings. Make sure they're accurate. Make sure they're thought through.
Ask why a couple times. Like, why did this number change? What is the root cause of that? Put that all together for you and then and then send it to you into your email inbox, into Slack, right, wherever you want it to be. And that is a now you come in there and it already asked all these questions that usually, you know, you would have to block a bunch of time and, also a full full full understanding of your data to actually be able to ask all these questions.
[00:13:25] Tobias Macey:
As far as the application of these generative models to the problem space of data analytics and business intelligence, There have been numerous approaches with some of the earliest being the idea of talk to your data or the text to SQL approach that was the very naive implementation from the very first set of these generative models that came into the mainstream with things like chat g p t or the Gemini or cloud models. And, obviously, there is some utility in that, but there is also a lot more work necessary to make it reliable and production grade and reusable.
And as we went through the rapid evolution of the overall AI ecosystem, this pattern of agentic use cases of give the model some tools and run it in a loop, and magic will happen. And I'm just wondering if you can talk to some of the ways that you would characterize the different strategies that you're seeing data teams and, in particular, different vendors take as far as being able to bring these generative models to bear on the problem of generating insights and enhancing the or maybe even just realizing the self serve promise of data and business analytics?
[00:14:48] Drew Gilson:
Yeah. I'll take this. Lucas and I have a whole lot to say on this topic. I think the you know, let's just go back to basics. Right? So the frontier models that are available today can recite from their context with very, very high accuracy. And so that wasn't the case even a short while ago. But what it means is that if you know that you have the facts in the context window, the LLM is almost 100% of the time, it will use those facts. Now it's not perfect because they are probabilistic and not deterministic. But if you have, as Lucas said, a a look at the business that's grounded in a semantic model that you could give to business users and be sure that they would get the right answers or perhaps a set of fact tables that has been modeled to do the same thing. It is certain that the model, when asked a question that can be answered with that data, it won't make it up. Right? I can confidently say that in, October 2025.
Now, of course, it's not nearly that simple, and there's lots of things that have to happen in order to make sure that, the system is working as as designed. But I think we've come a long way sort of from the naive approach of let's just, point the model at a whole bunch of, you know, schema information and metadata and get it to write freewheeling SQL. And, and then you run it against the database and then get the answer back, and then it has, you know, really no idea how to qualify it or ground it against prior responses. So those are, like, some of the things that we can do. Right? If we know what the trend has been over time, we can, look at the, say, the next number that comes out. Maybe that's daily sales data. And the models are also good enough reasoners now that if something is highly anomalous, like, if it's the trend is not making sense, as Lucas said, we can sort of double click on that and go, okay, let's unwind that. Let's go back to the source, back to the fact tables and see is that number, did it actually come from that fact table? Did it actually actually come from the semantic model? And so a lot of that reasoning can happen in the model itself. And in cases where we can't reason, we can just do things multiple times. Now, that's a little bit naive. It's a brute force approach. But you see it in all of the benchmarks today, Right? That's parallel test time compute. And so what that means is if the model can produce the same number to your question, say your question had like a single scalar value as the answer, if we do it, say, three times or four times in parallel and it always gets to the same number, well, it's almost certainly true that that number was coming out of the semantic model or out of the fact table in the database, and the model didn't make it up because that's just so highly unlikely. I think that that gets me to another point, which is, I think, a sticking in human analysis just as much in as in AI analysis. But the absolute value of the numbers, I think, is less important than directional changes and trends. Right? So a lot of the time, if you were to sort of mask the absolute values in a boardroom and have discussion about the trend, I think we'd get a lot farther and you'd probably make just as good in some cases and maybe even better decisions. Because sometimes when people get fixated on values, the conversation derails and you don't end up talking about the important thing, which is, well, maybe okay. Sales from this region or this channel doubled. Why is that? Or it's maybe not doubled, but some sort of substantial increase. And so the models are really good at reasoning through that sort of stuff too. Right? So I think that we're really trying to direct the the analytical conversations into what is it that we could do differently? What are the actions that I can take based on what the model has observed? And although, of course, we strive for 100% accuracy, and if the semantic model supports it for the use case that's in scope, we'll do that. But I think the way in which we apply JNI isn't so much to, you know, just produce the absolute numbers that then people can argue about and debate what they mean or what to do about it, but also come up with the next set of questions to ask and perhaps the next set of challenges that could be considered given the, like, directional changes that have been seen in the numbers. Like, all the next kind of second order and third order things that should happen, these models are, like, super creative. They're extremely creative. They're a lot more creative than some of the people that I know. And so that's, I guess, I think that's answering your question in the most direct way that I can. The models are super creative, and that's where we want to apply them. Does that make sense?
[00:19:04] Lucas Thelosen:
Yeah. I love the connection of, like, finding something to the actions you should be taking or thinking about. Right? Where back in the days of dashboards, you know, I don't know how much longer we'll have them, but I always wanted to have an action button right next to the numbers. Right? Because if I can if I can think that far into the business and why this dashboard is needed, then I would truly understand the problem we're trying to solve. I you shouldn't really have dashboards that are just nice to have. Have. You know? Like, think of your own car. Right? I I drive a a newer car now, and the the total mile count is no longer there. Why? Because it's not really that important. It shouldn't be the total miles your car has driven should not be front and center in front of you every day because it is not relevant to your driving decisions. It's not relevant to if you if you drive faster or slower. Right? Miles per hour is much more important for that. So similarly into into the context of your business, what metrics do you need to see and then what actions are you going to take because of what you're seeing. And the AI, you know, Orion is actually really great at suggesting what you should maybe wanna think about based on the root cause that it found. This number moved up and down, right, where you just have the example of sales doubling on a given on a given channel, maybe a branch of your company or something like that. Now what should you do about that? What can you do about that? Right? The same is true if something goes down, if a number goes down. And I think when you look at the usage patterns that we saw in this in the ages of dashboards, people tend to focus on the red numbers. They focus on anything going down, and why did it go down. Well, if it goes up, then let's ignore it. It's green. We don't we're not gonna look at that. It's really cool. The AI doesn't care if it's up or down. It's still interesting. But if something jumps up by 10%, it will investigate why it went up by 10% and what we could do with that information and what you as a employee in this company should be doing with that information. Where, you know, a lot of past was, let's focus on the red numbers, why that happened, and try to avoid that going forward. And so, any kind of analysis I've seen coming out of that tends to be very one-sided, you know, on how to avoid potential problems. But, yeah, I love the idea of, like, let's connect the insights we found to actual recommendations
[00:21:17] Drew Gilson:
and actions that we should take today. And and so, yeah, just to kinda get back to the accuracy and trust topic, though, I I it's not possible today to misrepresent a dashboard or a metric so badly that you kind of like, if a number is going up, the number is going is is going up. No model out there is gonna say is gonna look at a dashboard with a very steep increase in, say, any given KPI and then tell you the number's going down. That just doesn't happen. Right? These models are good enough now that that's not a concern. And so if we can trust the general intuition, let's call it, the model for what's changing, I would far rather have the 10 creative ideas about what the implications might be for that change and then allow the humans to sort of double click and chase those down and have a conversation about it. And I think that the risk of having a model grossly misinterpret something and come to, say, completely the opposite conclusion, I would way rather have those 10 creative questions, I guess, is what I'm trying to say. Because the absolute values are not that important, whether the number of, say, bananas we sold yesterday was 170 or 173.
It doesn't matter to me. That's the point that I think I was trying to make, but probably not so clearly. I think the important part is that we've, you know, got a significant number of new banana customers from Europe. Let's go find even more of them.
[00:22:44] Tobias Macey:
And that also brings up the interesting changes in behavior that can be generated by virtue of removing the necessary action of taking the initiative to go visit a dashboard versus having some agentic system that is able to deliver analysis and insights to me as they are generated. But then that also potentially brings in the concern over bias depending on how that agentic loop is being defined and who is controlling it and how the human factors into that workflow. And I'm just wondering if you can talk to some of the ways that you're thinking about that push based analytical delivery versus making somebody go and seek out those analyses and some of the ways that that contributes to influencing the organizational behaviors and the ways that the organization interacts with the data while maintaining agency and control.
[00:23:41] Lucas Thelosen:
Yeah. Being able to push what Orion found to where the business user already is is is really important to me because I think the there's so many tools out there right now where we ask people to go to it and and now do another thing, right, where, you know, you have all these saved passwords and everything. We should make it as easy as possible to use data and and then make decisions as we can for the business user. So let's push it where they already are. If that's in your email, if that's in Slack, if that's in a different system, right, let's do it there and then push the insights there proactively. So I thought you know, Ryan thought about you. It thought about the data, and it it determined that this is actually important without, you know, sending you any kind of spam or flooding your inbox. It determines this important and highlights that to you. That way, there's another barrier removed because where, you know, you look at the usage patterns of business intelligence tools or data warehouses, there's only very few people that actually come back and and dive deeper. And so the the vast majority of the businesses are not looking into it. Right? You have a couple power users, and that's about it. So can we bring it to the person? Then I think the other step that we miss a lot of times is is what I would call data literacy.
So you see a certain graph. You see, you know, something going up or down. There are actually only a small percentage of people that can relate that to their day to day job. But the our you know, an AI like Orion here can be quite well trained on do you see this? This is this person's job. This is the background of their role in the organization, and it can bridge that. And it can say, this is how this number relates to you, and this is maybe what you wanna do about it. And that gap of data literacy is now something we can finally bridge. So this is where you previously right? Like, everybody should have had a personal, you know, data analyst or, you know, chief of staff maybe who tells them, hey. Happy Monday morning. This is what I saw from last week. This is what I think we should be focusing on today, this morning at 8AM on Monday. And that's what we can now deliver here, which is a really exciting opportunity. Right? Because previously, you would have this maybe at a company like Google or Amazon where, you know, there were these there were these data analysts that did this for you and then presented this to you. But most organizations are not able to, to staff up to that level and have that level of sophistication. So being able to bring this now to to a wide variety of companies is really exciting.
[00:26:09] Drew Gilson:
And I think when we look at disciplines that are really well developed and mature in their use of data, so like site reliability engineering always comes to mind for me. Those people aren't looking at dashboards. Come on. Right? They have really sophisticated alerting that makes sure that they're informed before something bad happens. And that's what makes it possible for us to have the incredible technology that we do at Google and elsewhere. It's not somebody staring at a dashboard. Right? So I think when you take a step back and realize that, I think that sets the pattern for really what we should all be striving for, which is let's just get the machines to help us understand what's important to notify us. And then if the machine understands your action space, maybe even make suggestions for what to do about it. Because none of these major services would stay up if we ask their SREs to be logging in to check the dashboard every morning.
[00:27:05] Tobias Macey:
Digging now into the architecture and system design of what you're building at Orion, the term of the day is context engineering as far as how we think about how to actually feed the right information at the right time into these AI models. That is obviously a very key element of agentic application architectures. But before we get too much into that, there's also just the broader question of the rapidly iterating and rapidly evolving space of agentic design patterns and how to think about how to actually build one of these systems that will operate continuously and reliably.
And in particular, some of the numbers that get thrown around as far as the compounding error rates that can occur when you have multiple LMs in sequence that are have certain levels of uncertainty, and then that uncertainty can compound on how to mitigate that. I'm just wondering if you can talk to how you are thinking about and how you approach that overall agentic application architecture and maybe some of the ways that you are continuing to iterate on and fine tune that as new insights and patterns evolve.
[00:28:13] Drew Gilson:
Yeah. It's no no doubt that, there are some tricky problems that we've had to solve and I'm sure many others have struggled with as well. I think the multi turn quality benchmarks are sorely lacking, I think, for some of these newer models. And I hope that that's a space that will continue to receive more attention. I think that as our approach has evolved, we've actually gone a little bit. So in the beginning, I suppose, let me back up, we had a fairly modular architecture. So we had a lot of different agents with different system instructions for different jobs, and that orchestration between them was fairly hardwired because we broke down the process of analysis into a series of steps, and we decided which jobs need to be done and in which order. And as you sort of alluded to, it can become really, really tricky because deciding which information to propagate and pass to where is it's non trivial. And I've been really pleased as the models have evolved that we've had to do less of that. So without, you know, discussing the architecture in too much detail, I will say that the models are powerful enough now and the context windows are big enough that you don't have to be as discreet in your orchestration of the problem.
And in the past where we might have may have had like a decomposition step where we make a plan and then the agents sort of attempt to execute that plan, if you get say, a dozen steps in and something goes wrong, it could be quite brittle. And so now we can actually have an agent adjust the plan as it goes and it can keep track of the plan and it can potentially make changes to it. And the long term, like the planning horizon is much better, and we're able to see that the models stay on task for longer. I think actually Anthropic just announced that their new SONNET 4.5 checkpoint was able to stay on task for thirty hours unattended programming, which is incredible. And so we're sort of taking advantage of those new abilities to simplify in some respects the initial architecture that we came up with because the models are now able to do, more complex tasks with less orchestration. I think at the beginning you kind of mentioned tool calling in a loop. I mean, at the end of the day, that's all an agent is. So that's certainly the core of our system. Maybe to share just a little bit more detail, we have, you know, many different loops. Right? So if you have a quick question that as we reflect on the question and consider how it could be answered, if we decide that it is safe to answer quickly, then we probably won't do very many steps. If it's something more complicated, we will start a batch operation and we might do multiple processes in parallel to ensure that we arrive at the same answer because that trajectory is going to be way longer. It's far more complicated and there's far more risk that the agents could sort of drift or probabilistically just wander around. But if you can get the same answer a couple times from multiple processes of long trajectories with that agent, to go back to just the accuracy question, it's pretty likely that that is the answer because you had multiple trajectories that all sort of converged on that. Maybe the last thing I'll say, and we can unpack any of the things that I said, is just the analysis process for us kind of looks like really three steps. What's the data that we need? What do we wanna do with it? And then how are we gonna communicate the results to the user? And so each one of those tasks is fairly discreet. The last one is probably the most deterministic.
The users kind of let us know, well, I have no time and I prefer bullet points or maybe someone else likes their information differently or they're a visual person. So we'll sort of do a final, translation at the end of the findings that maybe turn them into something that's more appropriate or applicable for the person that we're delivering it to. And so each one of those steps has its own set of agents with its own orchestration and its own guardrails. And I go pause there.
[00:32:09] Lucas Thelosen:
One of the things I wanted to add is, on the context side, right, like, similar to probably any other engineering work out there, right, the first thing we thought about is, like, how can we make sure Orion quickly understands your data? And from that data derives who you are as an organization and how you think about things. What has become clear over the last year or so is that we also need to add the context about the organization itself. And this is something that hasn't been documented as much where, you know, some some of our customers have given us their onboarding materials for new analysts, for example. Super helpful. Or we we usually have, like, a kickoff meeting with the customer where we actually take the transcript and then give it to Orion as context. Like, how do you talk about your data? How do you talk about your organization? What is what do you call the salesperson? What how do you define a a month or quarter or a time frame? Right? All these different things, all these, like, institutional knowledge that we never really quite have done a good job of of, like, documenting. Right? Some have, like, a confluence page or so that we can use. But it's just it's a really interesting problem or it's not really a problem. It's a huge opportunity. Right? Because we're, we're documenting it and we're putting it into Orion, and then Orion can actually explain those things back to the next analyst that gets hired. So we we have an example where a customer hired an analyst, and they just asked Orion to explain it all.
It's, like, really interesting. Right? Like, it's a we thought the main problem we have to solve is understanding the data, and I think that's very true still. But there's also the just, like, getting quickly up to speed on on being a new employee in a company. And as far as
[00:33:45] Tobias Macey:
the agentic workflows, there's also now bringing us back around to context engineering, the challenge of especially in the analytics space, bringing all of the right information into the sort of access pool for the agent where that is one of the perennial problems of business analytics in the first place is getting all of your data into the right place and in the right shape. And that has led to decades of various data warehousing patterns and consultancies whose whole business it is to tell you how to structure your data. And I'm just wondering if you can talk to some of the ways that you're thinking about the opportunities for these generative AI and agentic systems to be able to either reduce the barrier to entry or assist in the process of leveling up your data as well as maybe some of the requirements on the organizational side to invest in that in the first place to even empower these agentic workflows.
[00:34:47] Lucas Thelosen:
So only a bunch of our prospects are asking us, like, what is the foundation we should set no matter if we go with Orion or not? You know, what is the what can we do today to be ready for any AI tool? And, there are, of course, like, if you already have invested in data catalogs and a semantic layer and, like, you know, we're working with Looker. We're working with DBT. We're working with a bunch of different BI tools and databases. There are so many options out there. Right? I think most AI tools will be somewhat agnostic from where they can pull the context. The at the lowest the lowest common denominator that I get to is, like, put some descriptions on your columns in your database. Right? Have your database table column names named appropriately. Right?
You can have a set of fact tables there that you wanna attach AI use cases to. As much as you can make it humanly readable, you know, that will make it very easy for any AI to to work with that. So you don't necessarily have to go all the way and and buy the fanciest, you know, data catalog and and data modeling tool necessarily. Like, you can set a foundation just right there in your database because that's where, you know, in the end of the day, the data will be pulled from by any tool.
[00:35:52] Drew Gilson:
I think making sure that you understand how long a piece of information is good for has has never been more important. Like, it was it's always been important. Right? I mean, in any organization, we have documents, you know, the essentially long tail graveyard of documents that were written once and then slightly relevant maybe that week and absolutely irrelevant every week, you know, thereafter. That is, something that we really need to consider and account for. Because if we pull all of the facts that a system like ours has gathered and don't treat them appropriately, like, with some temporal waiting, then we're probably gonna get confused. Right? And so I think another thing that an organization can do, and this is just good knowledge management, is really just to put an expiry on things, expiry your knowledge. Right? If you have, say, oh, I don't know, a way that you classified your customers, the model that did that was trained, you know, three or four years ago, unless somebody has looked at the way that worked, it might not make sense to use the scores that happen to be written into a column in your data warehouse that use that classification model that nobody understands or can still sort of vouch for the results, right? So it's garbage in, garbage out, of course. But really, I think what makes it easier to take the garbage out is to make it clear that that column was, say, for a particular event that happened in 2022, and then just clean it up, clean it up mercilessly.
[00:37:20] Tobias Macey:
With Agentic Systems, obviously, there is the piece of it that is under your control, under your guidance, within the bounds of Orion as a platform. But also because you're offering this as a service, there are certain boundaries that you need to establish with the organization and as as far as access to their data. But, also, maybe they're building their own agentic systems that they want to be able to use in concert with the platform that you're building. And I'm wondering if you can talk to how you're thinking about the distinction between the hard boundaries of what is only ever going to run within your system and what are the boundary layers that should be more porous and can be used to enable some cooperation between those agentic systems using some of these new protocols that are coming out, such as a to a.
[00:38:13] Drew Gilson:
Do I take a crack, Lucas? I've got some thoughts. Well, whatever if you wanna start. Oh, I'm always always ready to voice an opinion. So let me take it. I think it's I think this one is hard, Tobias. I I really don't think I think that's why we're hesitating a little bit because as a proxy for you in an organization, an agent, might be able to, in most circumstances, sort of observe the boundaries that would be normal for you and your role. But if we've suddenly got proxies for every single employee in an organization interacting with virtual proxies in another organization, I mean, this is very uncharted territory. And some of the protocols that we're building on top of don't even have the concept of identity or even basic security measures. So it's definitely fraud.
It's interesting, and I'm looking forward to helping figure out how to make it all work.
[00:39:04] Lucas Thelosen:
Yeah. I mean, I think there's gonna be a there's gonna be a wild west, and maybe that really is kind of a wild west of different agents doing different things. The I do think, though, like, when I think of, you know, our tool, I know it's often embedded into existing workflows in organizations. So our, you know, Orion is focused on deep analysis. That's at the very core of what it does. You know, if you add it to something like an agent ecosystem, that's the the main function it will fulfill in that. And then there are some interesting other functions other systems can fulfill. So you have, tools that are focused on knowledge retrieval across your organizational's data or at email, documents, and whatnot. That's a very interesting marriage of of agents. Right? You have, on the one hand, the knowledge retrieval across, you know, that that kind of context and business communication, so to speak, and then you have Orion there for data analysis. It can actually enrich enrich that, right, and say, okay. Here's, you know, back to your personal chief of staff every Monday morning or, actually, every single morning. Right? As you as you come into work, you get this this is what's happening today. This is what you need to know about past meetings you had, and here's the data to support your arguments or any kind of clarification you needed to do based on follow-up items. That's a really interesting world, and I've, you know, I've built that out for myself here. It's it's running. It tells me which meetings. Like, every time I have a meeting coming up, it tells me which one was the last conversation I had, what my follow-up items were, if, you know, like, the data I need to support it. Really, really interesting.
So there is a you know, if you think of what is the core for Orion and then how to build with it, you have this deep analytics engine that is Orion that checks its work, make sure it's accurate, make sure it's relevant and actionable to the end user. But you can add to it your own IP. Right? Maybe there's a certain way you like to forecast things. Maybe there's a certain way that, you know, if you go into the underwriting space, this is our algorithm we use to determine what someone qualifies for, which policy, you know, in in the insurance space someone qualifies for. You can go into all kinds of different scenarios where you have your uniqueness of your business that you can add to Orion's flow as a custom agent. And then, you know, 80% of that deep analytics is Orion and Orion running, and it's making sure it's it's working, but you added to it your your unique component to it. If it's either in the in the agent to agent space by adding external agents to it or by adding a custom agent into Orion's workflow?
[00:41:24] Drew Gilson:
I'm gonna answer this in a slightly different way that I think is, practical and very, like, apropos right now, and and and that's from a security perspective. So it is true that we have some really capable, harnesses for the models that we have today, like Cloud Desktop as an example. And, actually, even even ChatGPT now has an experimental MCP mode. But if I was a a technology leader or data leader in a large organization, there's absolutely no way that I would ever allow our private enterprise data to be connected to one of those harnesses Because then you get into a situation where you have the three things that, Simon Willison has kind of said or the, like, the touch, the third rail that you just don't touch. Because you suddenly have a model that has access to private data, but it also has access to the Internet. So, again, by definition, it has access to untrusted content that could give it instructions to go do something that you maybe didn't intend. And because it's hooked up to a bunch of different things, it has the ability to communicate externally. So if I was to, like, looking to deploy a solution to help people get more value from their data with AI, I would, at this point in time, not be comfortable having it live on their computer, coexisting with all of the other things that could be connected into cloud desktop or whatever tool you happen to use without the central governance and security that something like what we're building brings, which is a very different story.
We have access to your private data. That's true. But we don't browse the internet. There's no web search agent that could suddenly ingest some sort of poisonous instruction to exfiltrate data and post it to some endpoint that, you know, that that, you know, is not within your control. Of course, you don't want that. And we don't communicate externally except in cases where you've explicitly allowed that into a webhook destination that you've configured. And so because all that happens in the product, like in our managed environment, we can be much more sure that we're using tools safely and without sort of inadvertently allowing boundaries to be crossed. I mean, this sort of, like, tension between self-service and central governance is always gonna exist, but it's now, like, more important than ever. So I guess I just bring that up because, you know, there are very real concerns about some of this technology. If you have Cloud Desktop hooked up to 30 different MCP servers and one of them happens to be your corporate data warehouse, that's bad. Right? Let's create a safer way to interact with that private data that doesn't have the other connections that, are essentially ungoverned and could, you know, with clever malicious prompting, cause it to do, any number of things, like actually maybe move that data and do something with it that it's not supposed to. Tobias, I'm not sure if that's kind of where you were going, but I think it's important enough that I just did want to address it head on. Doing this type of stuff in an unmanaged desktop environment is probably not a good idea. I think it needs to be managed centrally and controlled by a central team.
[00:44:21] Tobias Macey:
No. That's definitely a very valid and important point to bring. And, yeah, I was largely just thinking about the potential for the sort of exchange of information or exchange of control between different agentic systems that are controlled by different entities and some of the possibilities, but also, as you pointed out, risks that are inherent in that. And I think now moving on from there, the other piece of building any sort of generative AI powered system, particularly in an agentic context, is the evolutionary aspect of how do you make changes and feel confident that the changes that you made didn't either introduce some completely unpredicted behaviors or degrade the capabilities of the overall system, especially as you get into these more complicated, multi agent, highly orchestrated systems and managing some of the experimentation loop, the observability elements, the prompt management, etcetera, as you continue to build and evolve a live and production system?
[00:45:27] Drew Gilson:
Well, we've been lucky enough to, have had a lady who helps us with this, who's been professionally trained as a as a entomologist, believe it or not, actually. And so you need somebody who can take a very scientific approach to classifying behavior, whether that's anthropology or entomology or computer science. But frankly, in this large language model world, it's much more likely to be something from the behavioral sciences or from maybe what would be considered less hard science so that you can take a step back and use things like grounded theory to look through agent trajectories and then pick out the, like, behaviors that you wanna label and then determine what to call them and then how to detect them at at scale. Right? So we've had some good help there. I think it requires an extremely diligent person who loves to read conversations between agents and and, again, like, develop those classification hierarchies.
It needs to be done automatically at scale using the taxonomies that you develop. So there's been a couple papers about this, just taxonomy of agent failures, but we've learned early on, like, we had to build on that to sort of determine how our system did both succeed and fail in production. But, yeah, it's extremely, fastidious work. Right? You have to, be laser focused on it, and everything changes underneath your feet all the time. Right? It's like trying to build a house on a sandy foundation. When the models change, the behavior of the environment might change. The data might change. Right? Data, by definition, is changing because you're receiving new data every single day. And I think that in order to be confident in, the behavior of a system like ours, unless you have the ability to do essentially, like, reflection about what your changes are doing and how you're perturbing the system, that would just be it would be impossible. Right? So we collect a lot of data ourselves about what, is going on, and we look at it continuously.
And if we didn't do that, we'd really be flying blind.
[00:47:23] Tobias Macey:
And in terms of the overall technique, I'm wondering if you're finding utility in things such as LLM as a judge or if it's largely a unit test style or just experimentation management in the vein of MLOps that, in the world before was the way of dealing with a lot of these probabilistic systems. I'm just curious if there are any particular either technologies or patterns that you found to be most effective.
[00:47:49] Drew Gilson:
Yeah. No. Like, all of the above. Right? And both online and offline. So we'll do, like, retrospective looks at the way that the system is behaving between releases or between checkpoints. But, also, like, we'll do online reflections about what has just happened because the models are are good enough that you can do that. So in various parts of the application, we'll actually feed the conversation as it's progressing into an LLM as a judge, or an auto rater. We tend to call them auto raters and say, how's this going? Is it going well? Is the instruction adherence good? If not, where is it going wrong? And what notes would you pass to the agents who you've observed to be doing things that are not optimal so that we can do it better next time? And depending on the the outputs and the scores of processes like that, we might start something over. And we might actually modify the system prompt for some of those interactions to say, Your colleague observed that this particular step didn't go very well. But if you were to do it this way, next time,
[00:48:56] Lucas Thelosen:
it would go better. And that's a simplification,
[00:48:58] Drew Gilson:
but what that causes is a real time dynamic self improvement process. And then if it works and the autorater agrees, we can modify the instructions permanently maybe from that point on, which sometimes we'll do. But, depending on the use case or or the, you know, analytical objective for the customer, it might be possible to just sort of retry at various points. So, you know, that's pretty fascinating stuff. It's also one of the harder things to have observability into because, of course, we wanna minimize that. We don't want that to happen very often. But we certainly appreciate that, you know, when it happens, it's most of the time actually gonna produce a much more positive outcome. And, that's one of the coolest parts actually about the capabilities of these models that they can actually learn. I mean, if that's not intelligence and learning, I'm not sure what is.
[00:49:47] Tobias Macey:
And to that point as well, I'm wondering how you're seeing opportunities for using the exhaust, if you will, of the agentic processes to feed into a self improvement loop or help to identify opportunities for enhancement or evolution of the system and some of the ways that you may be able to have them automate things like filing bug tickets and then feed that to another agentic development system to feedback into the production environment and just some of the ways that you're thinking about the autonomy of the system as a whole in the context of building and improving upon itself and the ways that you are thinking about the necessary oversight of any of those types of self improving behaviors.
[00:50:33] Drew Gilson:
Lucas, should we take the opportunity here to talk about creating deterministic tools? Because I think that's a really important step, you know, in our road map. This is something we've only just started. But the best way to improve a probabilistic system, as as far as my team has has concluded anyways, is to make as many parts of it deterministic as possible. And so when we create a tool that does a particular job for a particular type of analysis, it makes a whole lot of sense to use that tool again. And so we're starting to look for every opportunity to do that. So the self improvement in that case becomes very permanent because we've used the capabilities of the model to explore the latent space and try a bunch of different things. And then maybe the output was a Python script to do a certain kind of analysis. If the customer then goes, yes, that's exactly what I wanted to do. I I want that. Well, then we shouldn't make the model generated again. Right? We should run exactly that again, which is going to produce much better outcomes. And they're going to be the same every time. And so in that way, we've used some of the best capabilities of the probabilistic large language models to get to a place where we can then replay.
[00:51:43] Lucas Thelosen:
I think there's the, there's the interesting addition of, like, how do you keep content and context fresh. Right? We talked about this earlier where we set an expiration date on on content in essence. And Orion's ability to say, you know, I'm seeing a certain pattern of questions among the business users or of what type of analysis I'm being asked to do right now. Let me let me take that and I'll be proactive and suggest things into that direction. On the flip side, you also have, well, this is not being done that much anymore or, you know, the feedback I got from three months ago, it's not really that relevant anymore. Maybe I need to archive that. And, you know, we so we're we're talking about, okay, every every 30 days. You know? Maybe, like, if this has not been interaction anymore, wrap it up, and it doesn't it's not relevant content anymore. And making these calls and understanding which context is relevant where, that's a really interesting, interesting opportunity because it's on the on the on, you know, most old school tools. That was always a problem. Right? You go into your Google Drive folder and just a clutter of old content and, you know, like, finding the one that is important or just go to the most recent ones. Same for, like, any kind of dashboarding tool. Like, we're working with customers that have a thousand dashboards. When you look at which ones are being used, right, it's 30. 30 is about all that you need, but nobody ever goes through archiving it. Right? But now you have dynamic tools. You have I I don't I don't wanna say sentient.
Like, it's really interesting. Right? We're we're working with tools that are adaptive and, that learn from you. I think I think the one piece, right, that is really important to all our customers is that those learnings stay within their system, that it is based on them. Right? Like, what we do for our product is, you know, setting the right, like, how many days should it be until Orion summarizes something up and maybe archives some things. That's the things we we learn from our customers. But in terms of making Orion better, right, it's very important that the AI is never trained on you, like the larger AI behind everything, but it stays within your environment. That said, like, it's really amazing how a tool can become better and better over time and adapt to you in a very fluent motion.
[00:53:54] Tobias Macey:
And that point too of tool use is also important because that is how the agent actually makes things happen in the world is through the interaction with those tools, which also requires a certain amount of planning and evaluation to decide what are the actions that the model needs to be able to take. How do I expose those actions in a way that isn't going to overwhelm its context where I don't want to have 10,000 tools and make it decide because it's not going to be able to make that decision appropriately any better than a human would. And just thinking through how to build and integrate those tools, but also evaluate them for when they are no longer fit for purpose, particularly as either the model capability evolves or the focus of the system changes.
[00:54:44] Drew Gilson:
Yeah. I think there's some lessons here that I I would share. Recently, I think we've started to build more complex tools. So fewer complicated tools is definitely a better approach than a bunch of simple step one, step two, step three type tools. Makes it harder to evaluate. But as the model capabilities have improved, we see better outcomes with more complicated tools. And I think that's something that's been talked about more in the last couple months. And, again, Tropic's done a really great job in their developer relations, sort of teaching folks in the industry how to build great tools. The evaluation part is always tricky because, no matter how closely we can approximate the the real world or our customers' real environments, Sometimes things do surprise us, which is why we spend so much time pouring over trajectories. Right? So when you understand what the model wants to try to do, you can sort of get ahead of it and create tools that will be available for, you know, that time when it kinda looks for them. Right? And so the only way to get a sense for that is to really just spend a tremendous amount of time watching the agents work. What is it that they expect to be available to them to do their job because of the way that they've been trained? What is the general sequence that they like to do things in? And that's something that almost every time we see a new checkpoint come out, we have to reevaluate.
Some models are very verbose in sort of their their planning. Right? And they'll plan differently. And we've seen a lot of, little bit of whiplash, I suppose, as models have changed in the last year around that. But, yeah, it's it's it's really a dynamic empirical thing. Like, the tools that we have developed over time exist because we've really just watched how the models that we use work and then take steps to sort of short circuit or make their work easier by observing. And so the tools are really fairly straightforward. Right? Tools to extract the data, tools to perform certain kinds of analysis. And that might start out as bespoke scripts. Right? And then those scripts might get hardened into tools that you can pass inputs of a specific shape into and then get input get outputs back out. And then, tools for the formatting step. Right? Like, tools that might be related to, like, visualization or something. So we're not yet getting into the territory where we have, like, completely custom tools that would allow for actions to happen in somebody's systems of record or that sort of thing. I think that we'd certainly like to go there when we have the confidence with our customer, of course, that that would be a responsible thing to do. But that's going to be very interesting when we start to build custom tools that are not only ergonomic for to use to do analysis, but also just to do a full end to end job in an organization, like an operational job. And I think that's the next frontier.
If we could use, say, like, a tool registry inside a large enterprise and then the enterprise builds tools that their internal applications depend on and their humans depend on, then it probably makes sense to then register those tools and to make them available for AI agents as well. And I think that's the future that we're building for because that's going to unlock the most value, right? These custom tools that can do something that actually makes a difference in a business.
[00:58:01] Tobias Macey:
And as you have been building and iterating on Orion and engaging with your customers to understand how they're applying it to their problems, what are some of the most interesting or innovative or unexpected ways that you've seen it used?
[00:58:14] Drew Gilson:
Yeah. I'll take that. Well, we certainly didn't expect our customers to immediately do so much qualitative analysis. So I think that was surprising for me. Of everything that surprised me this year, that was the thing that I think I should have been a little bit earlier to spot. We spent so much time designing and building a product that could do quant analysis really, really well. Right? So if it had numbers in it, it was gonna work. And then, of course, no sooner did we did we launch and get customers using it. Everybody tried to get it to do things like read this corpus of 10,000 reviews and tell me what it is that I should learn and do differently from that. And that, of course, requires a slightly different architecture. And I think the most interesting part about the architecture that we eventually came up with to do that type of job is you have the models actually themselves invoking models, like, as an essentially, like a batch operation. Right? Because you can't put 10,000 reviews into a, you know, a model and then have it produce any sort of accurate summary. I mean, that's just this is much more traditional like NLP.
Like, let's do sentiment classification on each one of those rows, and then let's look at that in aggregate. And so we had to scramble to build tools that made it possible for it's like AI ception. Like, the AIs had to then themselves make, you know, essentially completions against large models for us to do that sort of use case. So we can do it now, but I think that was the thing that that was most surprising and unexpected for me. We've just been so focused on numeracy that we didn't think about the lingual parts of analysis and all the qualitative analysis that people might wanna do as business analysts, not necessarily as data analysts.
[00:59:59] Tobias Macey:
And in your experience of building this business and diving headfirst into this agentic space and helping to explore its boundaries and capabilities, what are some of the most interesting or unexpected or challenging lessons that you've each learned in the process?
[01:00:15] Lucas Thelosen:
K. The most challenging lessons we have learned. I think the it's probably the lesson everybody learns. Right? Consistency is very, very difficult to achieve. Like, that's and, you know, you build an enterprise tool. Consistency is very important. And in some ways, white AI is kind of the wrong tool for consistency. As if you want consistency, you probably just, you know, should as as Drew was saying, do it, in in old fashioned way. And so now we're having Orion do some of that where it's like, once it found a pattern that is really important and needs to be repeated many times, I'll, Orion, write the code and execute the code. So it's it's the same, and there's consistency every single time, you know, with the narrative, of course, being added to it with with generative AI. But the core of the execution of this workflow is no longer probabilistic.
It's not deterministic. Yeah. So that's the I I think in there's another really interesting observation from my from my end, which is that, everybody's very hesitant on on can it really do this? This sounds like magic. Right? And then we we kick off a proof of concept, and people get quite excited. And all of a sudden, it becomes, like, we want AI to do everything. We like, you know, from being skeptic to, like, I want this to do everything. And now what can it also do this? And can it also do that? And where, you know, we had them to start saying, that is not necessarily what AI should be used for, what you wanna use it for. I love your enthusiasm. Right? Having turned someone from skeptic to all of a sudden AI is the silver bullet for everything. Right? Like, we have to then walk them back and say, this is, you know, within this box, it can operate quite well. But these use cases over here, you know, like, we had with one customer that where it's a very clearly, like, a operational task. Like, a certain thing comes in that needs to be checked against the database and a result comes out. Like, that is a very you know, you you write that in in a normal script. You shouldn't have a deep analytics engine doing any of this. So I think we wanna be careful with, like, as we talk about all these AI tools, it's not I I don't care what the question is. The answer is AI. Like, that is not the right approach. Right? Like, we we have to be clear on on what it's good for and the areas we you shouldn't necessarily use it for. But it's so interesting to always see skeptics turn around into, now I wanna use it for everything.
And then, you know, we have to be be a bit more realistic of what it should be used for.
[01:02:36] Drew Gilson:
Yeah. I agree completely. It's it is pretty magical, though. You know? I think that's the that's the interesting part. We don't actually know, and I think I mean we as an industry, what these models can do. And it's pretty cool to see them do things that you never imagined them doing when you put them in an application like ours and give them the appropriate tools and guardrails. And so that makes for some delightful and surprising moments. I think we're only gonna see more of that. It certainly makes it tough to build, right, when you don't actually even, you know, you have an unknown unknown, right? And because of the way these models are trained, you know, the reinforcement learning from human feedback makes it very likely that no matter what you ask, of course, the model is going to attempt to satisfy you. And so think to Lucas' point that sometimes causes some folks to maybe not understand clearly what the capability boundaries are. And so that for us is difficult to control, right? And so, I think many humbling lessons this year, but that's certainly one of them, just ensuring that we have enough guardrails to make the product be aware of its limitations and capabilities and then coach our users to use it for the the things that it will do a great job at and then maybe redirect for some of the things that it's maybe not the right tool for. And I think there's that's an area of much research needs to be done, and there's a lot of unanswered questions related to that tendency for these models to want to believe that they can do anything that you ask them to. And, it's very it's, very, very interesting to contain the behavior.
[01:04:17] Tobias Macey:
And to that point of AI being this magical experience, but also not the answer to every question, what are the situations where you would advise against going down the road of building an agentic application?
[01:04:34] Lucas Thelosen:
I mean, I think if you want, you know, if you want the same pattern every single time, right, if you want the same, I I mean, you could take an approach like we are doing where, okay, once something is locked in, you know, you you rewrite some of the code to be, deterministic and then have the language part be focused on just the the storytelling around it. Right? That might be a good approach. I think also anything where latency is super important. Right? Like like you want if you if split seconds are important, don't put an l m there. Like, it it it has different cycles of responding and thinking, and that's how it was built and supposed to work. So, yeah, latency and and repeatability, I think, are probably the the two factors.
Personally, I I think, you know, you anything around companionship, probably not, but that's my personal political opinion, you know, in some in some way. I I mean, it does a really good job at it too. And, so so I I guess there's there's a whole debate to be had about that.
[01:05:27] Drew Gilson:
I I think there's maybe one more anti pattern, and and it's about the, the shape of the inputs or the problems. Like, if you have truly unique assignments every time and there's no opportunity for you to sort of iterate and refine the behavior, that's probably not a great use case. Like if you, for instance, analyze a completely different schema every single time you sit down, which for some types of companies is true, like survey responses is a good example of that, It can make it harder to make it harder to know that the product has done a good job if you're doing a completely different job every time. And so there's a fine balance there, right? And you want to be able to be familiar enough with how you would do it that you can sort of be there to guide and assist and observe. Because I think that it's critical to have that human involvement at least for, you know, a number of years to come before we delegate, you know, entirely. And it's certainly riskier here to delegate if it's a brand new job with different characteristics every time the agent is working on your behalf.
[01:06:34] Tobias Macey:
Are there any other aspects of the work that you're doing on Orion or this overall space of agentic systems that we didn't discuss yet that you'd like to cover before we close out the show?
[01:06:45] Lucas Thelosen:
Well, I think there's a really interesting opportunity here in in this space where, you know, in the past, we had, you know, we have we have databases with solutions built on top of it for knowledge workers. Right now that we have more and more of this, like, AI knowledge workers that work with you, do we still need these UIs that people user interface that people didn't really appreciate to begin with? Right? Like, I I'm thinking of, like, you know, some of your old school ERP systems or, you know, CRM systems or, in in our case, like, some of the dashboarding tools that are out there. They're just hard to use. You know, we we we use them because we had to use them. But now as we're building these quiet, advanced AI knowledge workers, they don't necessarily need these tools. They don't necessarily need like, they can go straight to the database. Right? They can read it. They can take the actions with you. And so I think there's a really fascinating future here in the next couple years where these AI knowledge workers become a big part of our day to day life.
[01:07:43] Tobias Macey:
Alright. Well, for anybody who wants to get in touch with the both of you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspectives on what you see as being the biggest gaps in the tooling technology or human training that's available for AI systems today.
[01:08:00] Drew Gilson:
Well, the change management, as Lucas alluded to, to make it, easier for people to understand what is a good use case or what is a good task, is substantial. It's extremely hard. I mean, imagine if you're leading an organization with tens of thousands or more people who are all expected to use AI now in their jobs Without taking the time to educate, you're probably setting yourself up for a pretty painful transition. I don't believe that there is, at least not yet, we've got to remember, we're in this little bubble. I mean, we're technologists. The people listening to the show are technologists. They've been steeped in this since, you know, 2022.
And, it's hard to get perspective. You know, the people in the health organization here in the province where I live, they have no idea. And so I think, really at every opportunity, we need to step back and think, you know, how would I explain this to my mom? You know? And how would I make sure that, or whoever it is, that this person can use the technology, to make their life better at work or personally in a safe and responsible way. And I think it's alarming that now we see, you know, these articles about people. Lucas alluded to it, but, you know, forming bonds with AI companions that really aren't much of a companion at all and might get people into trouble. And I think the parallel in the workplace might be that if if you forget that you ultimately have the accountability to do, you know, the best possible thing for
[01:09:26] Lucas Thelosen:
you in your role, and you sort of delegate or abdicate your responsibility, like, you know, that's that's bad. That's gonna lead to bad outcomes for you and for your organization. And so, I think there's a lot of training that has to happen. Yeah. Can maybe, if you don't mind, Drew, I wanna just add to this. Like, the single biggest career advice I'm giving anyone out there right now is, like, learn to become an AI manager. Right? Like, you are in like, be empowered by these AI tools that help you to be the manager of them. Right? Like, you orchestrate on how they work and how they work together, which task can be delegated to which tool to really make you shine in the organization. Because, you know, the AI train has left the station, so I would jump on it, you know, and own it as like, okay. I'm gonna be the manager of our, you know, of Orion or AI analyst, or I'm gonna be the manager of our knowledge management system or whatever it might be. I think that sets you up for a super interesting future. I'm just thinking of, Drew, in my journey of starting this and building this where, like, the experience we have gained working in this for the last, you know, like, year and a half year is incredible. And, you know, the then then you compare it to people that that didn't jump on the on the wagon per se. But this is a very real technological change. It's not gonna disrupt everything within a day or two. Right? This will happen over time. But I think right now is an amazing time to to learn on how this can elevate your career.
[01:10:48] Tobias Macey:
Absolutely. Like, I wholeheartedly agree with that becoming the manager of AI where it's not just the case where you can be the heads down independent contributor cranking out code or cranking out reports. It's you need to understand how to apply these tools and level yourself up to a higher degree of abstraction than what you used to work at.
[01:11:12] Lucas Thelosen:
A 100%. Yeah.
[01:11:14] Tobias Macey:
Well, thank you both very much for taking the time today to join me and share the work that you're doing on Orion and the ways that you're thinking about building these agentic systems, particularly in the context of business analytics and the need for a high degree of transparency and accuracy. So I appreciate all of the time and effort you're putting into that and, helping to blaze the trail for others to follow. So I hope you both enjoy the rest of your day, and and, thank you again for coming. Thank you so much for having us. Thank you very much. Thank you for listening. Don't forget to check out our other shows. The data engineering podcast covers the latest on modern data management, and podcast.init covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts@AIengineeringpodcast.com with your story.
Welcome and guest intro: agentic analytics and trust
Backgrounds: Lukas and Drews paths into AI and data
Early ML stories: mortgages, optimization, and computer vision play
Introducing Orion: asynchronous AI analyst and accuracy mandate
From text-to-SQL to grounded agents: strategies for reliable insights
From dashboards to actions: trends, root causes, and recommendations
Push vs pull analytics: behavior change, bias, and literacy
Agentic architecture: context engineering and compounding errors
Readiness and data foundations: semantics, freshness, and expiry
Boundaries and security: agent ecosystems, governance, and risk
Observability and evolution: taxonomy, autoraters, and testing
Self-improvement loops: deterministic tools and context hygiene
Tooling design: fewer complex tools, registries, and future actions
Customer use cases: qualitative analysis at scale surprises
Lessons learned: consistency, scope, and capability boundaries
When not to use agents: latency, repeatability, and novelty
Beyond UIs: AI knowledge workers and direct data interaction
Big gaps: change management and becoming an AI manager
Closing thoughts and sign-off