Summary
In this episode of the AI Engineering Podcast Jamie De Guerre, founding SVP of product at Together.ai, explores the role of open models in the AI economy. As a veteran of the AI industry, including his time leading product marketing for AI and machine learning at Apple, Jamie shares insights on the challenges and opportunities of operating open models at speed and scale. He delves into the importance of open source in AI, the evolution of the open model ecosystem, and how Together.ai's AI acceleration cloud is contributing to this movement with a focus on performance and efficiency.
Announcements
Parting Question
In this episode of the AI Engineering Podcast Jamie De Guerre, founding SVP of product at Together.ai, explores the role of open models in the AI economy. As a veteran of the AI industry, including his time leading product marketing for AI and machine learning at Apple, Jamie shares insights on the challenges and opportunities of operating open models at speed and scale. He delves into the importance of open source in AI, the evolution of the open model ecosystem, and how Together.ai's AI acceleration cloud is contributing to this movement with a focus on performance and efficiency.
Announcements
- Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systems
- Your host is Tobias Macey and today I'm interviewing Jamie de Guerre about the role of open models in the AI economy and how to operate them at speed and at scale
- Introduction
- How did you get involved in machine learning?
- Can you describe what Together AI is and the story behind it?
- What are the key goals of the company?
- The initial rounds of open models were largely driven by massive tech companies. How would you characterize the current state of the ecosystem that is driving the creation and evolution of open models?
- There was also a lot of argument about what "open source" and "open" means in the context of ML/AI models, and the different variations of licenses being attached to them (e.g. the Meta license for Llama models). What is the current state of the language used and understanding of the restrictions/freedoms afforded?
- What are the phases of organizational/technical evolution from initial use of open models through fine-tuning, to custom model development?
- Can you outline the technical challenges companies face when trying to train or run inference on large open models themselves?
- What factors should a company consider when deciding whether to fine-tune an existing open model versus attempting to train a specialized one from scratch?
- While Transformers dominate the LLM landscape, there's ongoing research into alternative architectures. Are you seeing significant interest or adoption of non-Transformer architectures for specific use cases?
- When might those other architectures be a better choice?
- While open models offer tremendous advantages like transparency, control, and cost-effectiveness, are there scenarios where relying solely on them might be disadvantageous?
- When might proprietary models or a hybrid approach still be the better choice for a specific problem?
- Building and scaling AI infrastructure is notoriously complex. What are the most significant technical or strategic challenges you've encountered at Together AI while enabling scalable access to open models for your users?
- What are the most interesting, innovative, or unexpected ways that you have seen open models/the TogetherAI platform used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on powering AI model training and inference?
- Where do you see the open model space heading in the next 1-2 years? Any specific trends or breakthroughs you anticipate?
Parting Question
- From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?
- Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers.
- Together AI
- Fine Tuning
- Post-Training
- Salesforce Research
- Mistral
- Agentforce
- Llama Models
- RLHF == Reinforcement Learning from Human Feedback
- RLVR == Reinforcement Learning from Verifiable Rewards
- Test Time Compute
- HuggingFace
- RAG == Retrieval Augmented Generation
- Google Gemma
- Llama 4 Maverick
- Prompt Engineering
- vLLM
- SGLang
- Hazy Research lab
- State Space Models
- Hyena Model
- Mamba Architecture
- Diffusion Model Architecture
- Stable Diffusion
- Black Forest Labs Flux Model
- Nvidia Blackwell
- PyTorch
- Rust
- Deepseek R1
- GGUF
- Pika Text To Video
[00:00:05]
Tobias Macey:
Hello, and welcome to the AI Engineering podcast, your guide to the fast moving world of building scalable and maintainable AI systems. Your host is Tobias Macy, and today, I'm interviewing Jamie Daguerre about the role of open models in the AI economy and how to operate them at speed and at scale. So, Jamie, can you start by introducing yourself?
[00:00:29] Jamie De Guerre:
Sure. Hi, Tobias. Thank you so much for having me on the podcast today. My name is Jamie DeGeer. I was founding SVP of product at Together dot ai, where I've been for about two and a half years now. Joined right around the founding of the company. And prior to Together, I spent ten years working in startups in sort of similar roles, leading product management, marketing, and field technical services. And then the second startup I was at was acquired by Apple, and I spent nine years at Apple, first leading the program management team for search instead of the engineering organization, and then leading product marketing for AI and machine learning across Apple for about five years before I came to Together.
[00:01:10] Tobias Macey:
And do you remember how you first got started working in the ML and AI space and why you've spent so much of your career there?
[00:01:17] Jamie De Guerre:
Yeah. You know, I I first started working on machine learning in earnest, when we worked on search. We'd done some experimentation with that before then at, Cloudmark when we were doing email and SMS security applications. We were using some machine learning techniques in in sort of our research, but we didn't actually use them much in production. But then Topsy, we built a social search platform, and we used machine learning quite extensively for things like search relevance. And after Topsy was acquired by Apple, that continued and expanded. We used machine learning for lots of different parts of the search stack on the back end. And the search technology quickly expanded to be used across other applications at Apple, like helping with Siri and improving the understanding of your requests in Siri. And as I continued to work at Apple across AI machine learning in other areas, I worked on features like predicting which app you wanted next on your iPhone based on your routines and and many others.
[00:02:21] Tobias Macey:
So bringing us now to your current position, your current focus, I'm wondering if you can just give a bit of an overview about what it is that you're building at Together dot ai and some of the story behind how it got started and the problems that you're trying to solve there.
[00:02:35] Jamie De Guerre:
Yeah. Absolutely. So at Together dot ai, we're we're really providing a full AI cloud stack, and we do that with a focus on performance and efficiency to help customers achieve the best cost performance of whether they're building a model or customizing a model through fine tuning or post training or running a model for their production applications for inference. And so we call that the AI acceleration cloud because of our focus on speed and efficiency. And, we do this for developers. We now have over 550,000 AI developers using our platform for for these types of applications.
We also do this for foundation model labs. So not the, you know, OpenAI and Anthropic, but sort of a next tier down from that. We service a lot of leading foundation model labs, whether it be Salesforce Research, companies like Mistral, and and many others. And we're starting to also now do this for large enterprises. As they reach scale with running generative AI and production, our performance and efficiency becomes a huge advantage. And so we have customers like Zoom that uses our platform to train models that analyze conversations in a Zoom call and allow you to enable an AI assistant to chat about the history of those companies like Salesforce, which powers large parts of Agent Force, using our platform. And how we got started, the the the we were we were started in June of twenty twenty two.
This was after GPT three had launched, but before ChatGPT had come out. And we have a very deep research heritage in our founding, and we found that a lot of the AI community was really exacerbated with the fact that there was finally massive improvement happening in the quality of AI models, but suddenly it was being done in a closed way. And it wasn't done in the open with with sharing the techniques. OpenAI wasn't publishing how they had built GPT three. And this was really frustrating for the AI community because everyone had always, over the last fifty years of AI improvements, done so out in the open so that researchers could improve on past research. And the sort of feeling in a lot of the tech world was and the message was that, you know, we would have one big AI model, and it would become, you know, artificial general intelligence. It would be only one model would ever do that, and that was the end. You know, that was the way it was gonna be. And we just thought this was such a fallacy. We thought that there would be many models from many organizations.
And importantly, if we could help to spur the open source movement around AI, a lot of those would be open source. And when you look at major technology platforms over the last, you know, twenty years, open source becomes a huge part of those. You know, the Internet runs on open source. All of the servers run on open source Linux, on open source web servers, on open source databases, and other things. And we thought that that would be really important for this new technology movement of AI to also have foundations and openness so that we can understand how these models behave, how to build applications that deeply understand the technology under them and build for that appropriately, and, you know, so that society as well understands that. So that was what we felt the future should be, and we thought we could have a a small part in, like, helping to shape that and spur that as a movement. And if you imagine that future where there's lots of great models from lots of sources, many of them are open source, to some extent, the models become a little bit more of a commodity. Obviously, you know, great closed source models will still have huge success, but organizations will have the option to choose great open source models as well. And if the models are becoming a little bit more of a commodity, where does the value accrue? And we think that because it is so much more expensive to build and run a generative AI application compared to building and running a database and web server application, a lot of that value will actually accrue to the infrastructure.
And so if we can be the best at making it more efficient to use the infrastructure and providing faster performance out of the same infrastructure, then a lot of that value could potentially accrue to us. And so that's really the kernel of how we started TogetherAI with this focus on providing an AI acceleration cloud that provides the best cost and performance for just about any model. And we now have, you know, over 200 leading open source models available through the platform.
[00:07:06] Tobias Macey:
On that note of open source and openness in the model ecosystem, there's been a lot of debate about what that means semantically, what it means from a legal and licensing perspective. And before we get too far into that, there's also the element of who is producing these open models where the initial releases of open models were largely as a reaction to the open AIs and anthropics where they have these frontier models that are proprietary and closed. You can only access them via the API. There's no real insight into how they're built, how they're operating, and you just have to rely on their infrastructure if you wanted to build on top of it, which carries with it a lot of platform risk. And so the initial set of models that were released as open source, again, delaying a little bit on the semantics of that argument, those were mostly coming from the big tech companies. So the LAMA models from Meta were some of the initial ones. There were the Mistrala models that came out as a response there as well. And then also in recent months, Google has been releasing some fairly popular open models. And I'm wondering from that initial phase of two years ago when we were really starting to get these open models released and starting to gain some adoption, how has the ecosystem shifted, and what do you see as the distribution of who is producing those models compared to which ones are actually being used and leveraged?
[00:08:40] Jamie De Guerre:
Yeah. No. That's a great question. You know, I think what is so fascinating about what's happening today in the AI industry is there's innovation at many different levels. And traditionally, like, the way we thought about things, the the the way that the messaging came out of these big labs was that all of the value of creating this intelligence came from massive pre pre training. And so pre training is really taking, like, all of the Internet's data or all of the private data in the world that you can collect and building a very, very large model, a large base model from nothing, starting from random weights. And it is expensive. You know, the the the the messaging from a lot of the big labs was that, you know, it costs billions of dollars to build one of these models.
And to some extent, you know, that's true of this pre training process. It really is very expensive. Probably not billions of dollars today, but it's certainly not something that just anyone can do. You need to have really large AI compute factories to to be doing that kind of training. But then a lot of gains started to happen from post training techniques. Taking an already built pre trained model, a base model, fine tuning it, doing, reinforcement learning from human feedback, reinforcement learning from verifiable outcomes, which is a newer technique, and lots of other techniques in the whole post training process to modify this big base model to do new things or to improve it in some way. And this achieved tremendous gains. It wasn't what the huge organizations would wanna focus on to tell you is the way they're achieving those gains because it's actually very, very inexpensive and easy to repeat and iterate in lots of different ways. And then more recently, we're starting to use techniques like inference time compute or test time compute to get these models to do more reasoning and sort of think while they're outputting an answer. And this, again, doesn't really require large training processes to get a model to start to behave that way. And so I think what you've seen in the open source community is that there are a number of very well funded, quite large organizations that are investing in building really, really high quality and, you know, large pretrained models. This includes Meta with the llama models. This includes large organizations like Google and Microsoft.
It also includes not for profit organizations that are well funded, that are helping to do this, and some increasingly, a number of startups, startups like Mistral, which was, you know, releasing state of the art models with a team of 30 people through real innovation at doing this really effectively. And so they they create these large pre trained models just like the big labs do. But what's different is once that pre trained model gets released, the open community rapidly improves them through other techniques. And so you get literally thousands of different versions of the llama models created by the open community and shared openly through platforms like Hugging Face.
And each one starts to be better in different ways, and each one enables the next one to learn from their learnings. And this has enabled both higher quality in new ways, but new techniques that have enabled organizations to do this more efficiently and easily, which which has been a huge advancement that comes from that open source movement that the big labs are now copying, things like LoRa fine tunes and other things.
[00:12:08] Tobias Macey:
Delaying again the question about the semantics of open and open source, I think it's worth digging into your point of the proliferation of these different fine tuned models, distillations. The the number of models has been growing faster and faster, which also leads to a lot of confusion and challenge as far as which model do I use for problem x and the challenge of figuring out based on what you're trying to achieve, how do I even start to sift through these different models and understand the implications of how they're going to perform on my particular problem set versus, oh, I there are too many options. I'm just going to go with one of the big model providers and use that, or maybe I start with the big model and do my own distillation, which then compounds the problem further. And I'm curious how you're seeing people address that challenge of the paradox of choice that we're faced with now.
[00:13:01] Jamie De Guerre:
Yeah. I I think that that is super challenging, and it speaks to kind of you know, the the the biggest challenge in general working in this space is just the pace of change. Things are moving so quickly with new models, new techniques that it's very, very difficult to keep up and know exactly which techniques to use or which models to use or other things. And I think that this is one of the key areas we're gonna see improve in the coming couple of years, maybe the coming, several months is increasingly the providers are gonna provide an AI system, an AI platform instead of just a model. You already are seeing this today with with OpenAI, for example.
Their latest model releases are models that are an AI system built to use data that they pull in through a RAG, built to use tools as part of the tool chain of every request and, you know, compile code and look at the output of the code or access a mathematics calculator or other things. And and I think that this trend will continue and will start to move to a point where you obfuscate out which model gets used. The AI system will actually choose the model for the task, both at a request level and on a level of helping an organization say, okay. Let's say, for example, that you are a insurance company, and you've got a model that is intended to interact with customers around insurance claims, and you've done some work to fine tune one model. Let's say you started with Google's Gemma model. You fine tuned that. That model will get used in production, but it will get monitored and evaluated in an automatic way on an ongoing basis. And as new models come out, let's say, Llama just released Better just released the Maverick, Llama four model, that fine tuning data will automatically be applied to the Maverick model. You'll have a new candidate model that is now available in this production AI system. A subset of requests will sort of be sent to the new fine tuned model in sort of a shadow mode and evaluated. And then if it's achieving higher accuracy, either through prompting the administrator or developer running this system to approve starting to use it based on the evals or eventually automatically, the AI system will actually switch the models. And so you'll you'll start to have this system that is constantly creating and evaluating new candidate models for you based on your fine tuning data, based on your production traffic, based on new models that are coming up from the community, and just greenlighting the one that's best for the task and acting as a total platform and system as opposed to saying a specific model.
[00:15:33] Tobias Macey:
And now exploring that complexity of the terminology being used around these open models as far as the first set of them were very, freely called open source. And then there was a lot of pushback saying, no. It's not actually open source because all we have is the model and the weights. And maybe we have the code, maybe we don't. Most of them, you don't have the data. And so now there's been enough iteration that there is actually an OSI approved definition of open source in the context of these AI systems. And, again, there's variety of levels of compliance with that as far as the models that are out there. I'm wondering how you're seeing the current language used around the idea of openness and the generalized understanding of what those different terms mean and the context of what types of rights and freedoms are being granted to the users of those models.
[00:16:26] Jamie De Guerre:
Yeah. I think it's a great point. I think that there's a lot of variety here, and, you know, this is not new to the open source community. The open source community has grappled with this since the very beginning in terms of, you know, different licenses, whether it was, you know, the MIT license or an Apache license or, licenses that were not as, legally permissive for different use cases. And there's a new layer of complexity with this when it comes to the AI models as you're pointing out because the model could be open weights, meaning that the weights are available for anyone to use with a certain license attached to it. And you can download them and you can use them on your own system. But the source code of how the model was created might not be available. And so those are sort of referred to as open weights models, but not open source, technically speaking. The other aspect is that, like you mentioned, the data that the model is trained on. You know, a fully open source model release should really include something that you can reproduce.
And if you can you can reproduce it, you would need the code that was used to train it. You would need the resulting weights that were outputted, but you'd also need the data that was that was used to train the model. And so that's sort of an open data model. I think it's helpful that we create these distinctions. I think that it is helpful that different organizations do this at different levels. You know, some organizations do all three, which really helps the research community learn even more from the release of what's being put out. But I don't think that we should think about this in terms of, you know, if an organization doesn't release the source and does not release the data, then it's not a truly good open release. It's not helpful. That's definitely not the case. Just having an open weights model is tremendously helpful to the community, and it enables so much of an ecosystem to occur around that. The meta Meta's llama models are just open weights models. Their source code is not released. The data training data is not released. But they they still enable a tremendous ecosystem to build on top of those models and use those models. And so I think that all of these levels are are valuable, and it's great for there to be a variety from the open community.
[00:18:34] Tobias Macey:
From the perspective of organizations who are investing in the adoption and use of AI for their business systems, whether that's internal or customer facing. What do you see as the typical phases of evolution from the initial set of, hey. I've played with OpenAI. This seems great. Oh, wait. I don't wanna take on the platform risk of some proprietary company deciding that they wanna yank out some feature from under me. So now I'm gonna go to these open models up through to, hey. I've built this self evolving AI system. It's doing the what you were saying before as far as the automatic fine tuning, automatic candidate generation, automatic shadowing. Obviously, there are a lot of steps and a lot of layers of complexity on that path.
And then even eventually to the point of, hey. I've got enough of my own source data. I'm going to build my own foundation model from scratch. And I'm wondering how you're seeing organizations tackle that great that that evolution of complexity and sophistication on that path.
[00:19:34] Jamie De Guerre:
I think that there's kind of two vectors of this. One is on the achieving model quality, and then the other is on dealing with operational scale and performance and and hosting. I think on the achieving model quality, I think what you mentioned is very typical where an organization will start with a basic prototype using a model like OpenAI or Anthropic Cloud. Very easy to get started. These models tend to be of the highest quality out of the box, and you can quickly prototype to see if the application makes sense and if AI can be leveraged for the task.
Once you see that it can be, a lot of these organizations run into challenges. One of the challenges is, as you mentioned, being beholden to a single vendor and single platform, and it could change under you. Another is the cost. You know, if you're starting with the the biggest model, it's often the the highest cost. Another is the performance and scalability and reliability. It's one thing to prototype for, you know, a thousand employees. It's another thing to run it out roll it out to millions of consumers. And so all of these reasons often lead them then to saying, okay. I want to invest in tuning an open source model to my application and be able to operate this with more control and ownership at higher skill and lower cost. And so they switch to open source models. And I think on the accuracy vector, that usually starts just with prompt engineering in a similar way to how they probably started with the closed source model. Then the next phase is is typically adding data to the prompt through RAG with retrieval augmented generation so that you can imbue knowledge from your internal enterprise into the context that the model uses to respond to a request.
And in many cases, that is all you need to kind of get a successful application. In many cases, organizations go further to do a combination of RAG and post training or fine tuning of the model, which can help to adjust the behavior of the model or also imbue some knowledge into the model at the time of the training. And we usually see, like, for the most advanced deployments, using a combination of both is the most successful. I think that the next stage we'll start to see more and more is sort of a regular pipeline for that constant improvement. Today, that's kind of done through human methodologies of, like, research teams having a set of sprints where they kind of retrain and and experiment. But increasingly, they will we'll see the AI platforms help customers do that more and more easily. I think really quickly, the other vector really is around the operational performance and scalability. And I think often we see organizations when they first start to switch to to open source, they'll self host on maybe their their existing cloud provider, like AWS EC two instances or something along those lines using an open source inference system like, say, VLLM or SG Lang and operating it on their own. And this is a great way to get started using these open source models. But as they start to deploy into production, that's a typical time when they they come to us and say, like, this was tougher to operate at scale than we expected. There's lots of issues that are that are challenging that we haven't developed approaches for yet, and the overall cost is really, really high. And by switching to a platform like Together.ai, we take care of all that operational burden for them and are usually able to give them better efficiency where we reduce the amount of GPU compute needed by half or a third or or or sometimes even more, reducing that cost and improving that total cost of ownership.
[00:23:04] Tobias Macey:
And as far as that transition point from I have moved from the proprietary model. I'm now using open models. I've got my RAG workflow in place, and everything is working to then saying, okay. Now I'm going to take the next step into actually fine tuning my own models or even progressing further to building, your own foundation model. What are some of the heuristics that you're seeing organizations make as far as whether and when they actually want to take that next step beyond just being consumers of the existing off the shelf models and into actually investing in their own capacity and capability to generate new models either from base models or from or from whole cloth?
[00:23:47] Jamie De Guerre:
Yeah. That's a good question. I mean, I think first is kind of a organization level, strategic level heuristic, not a technical one, which is sort of how strategically important is our investment in generative AI. And for a lot of organizations, this is a huge strategic imperative. You know, an insurance company may view it as existential that they become the leader at leveraging generative AI most effectively to help automate the process of making the best predictions on whether or not to insure someone or the cost to insure someone on or a location to insure in or not and other things. Because if they don't become the best at that quickly, their competitors will and they will be at a tremendous competitive disadvantage and and sort of lose the market. You know, that's kind of stated in an extreme way, but I think that that is a reality for some of the impact that generative AI is gonna have on so many industries.
And so for this sort of first heuristic, that question becomes like, is this critically important to our organization that we strategically invest in generative AI and start to create the muscle to be great at leveraging generative AI amongst our internal organization. And if it is that strategically important, they want to get good at doing that process. They want to own the result of their investment in generative AI and control it. They want to be able to repeat that process quickly when new models come out. And, often that then leads to wanting to fine tune and post train a model that they own the resulting model and control it and that they can have the the the muscle, the team, the staff, etcetera, to be able to repeat that process. I think the second heuristic is a technical one, which comes down to what type of change in model behavior are we trying to create. If the change is solely kind of information based or knowledge based, like this model on its own doesn't know about our products or our internal process documentation, then then RAG is very well suited, obviously. You you can pull that information as long as that information quantity is small enough that can fit into the context of a request and you have a good enough search capability to find the right documents. You can pull that in before the model actually processes the request and imbue that knowledge at at request time. But if the amount of that knowledge goes beyond sort of what you can put into context or you're trying to teach the model sort of new reasoning technique or get the model to kind of understand broadly how your whole industry functions or something of that nature. Let's say it's like weather prediction or, you know, insurance prediction could be a good one. Like, that's not just like you need to have the weather data for the past two years in this location. You actually wanted to understand the impact of weather changes on insurability.
That's kind of imbuing new reasoning capability. Or if you're just trying to change the behavior of how the model behaves in terms of how it communicates and how it outputs. Sometimes that can be done with simple prompt engineering, but sometimes post training does it more effectively. And so this sort of heuristic of what type of change you're trying to create in the model behavior becomes the other main reason for when to decide whether or not to do post
[00:27:08] Tobias Macey:
training. Another major avenue of conversation that's happening both in the open and proprietary space, although, obviously, more transparently in the open models, is the type of architecture that's being employed for the building of these models, particularly when it comes to inference time efficiency where the transformers paper was what catapulted us into the current generation of generative AI that is the predominant architecture of most of the models that are in this space right now. But there has been a lot of conversation in recent months about alternative architectures that are better for either training compute or being able to run on other types of hardware that doesn't necessarily lock you into the NVIDIA ecosystem as well as the efficiencies at inference time or the ability to have a smaller number of weights that produce a, more outsized capability compared to the similarly weighted transformer models.
And I'm curious how you're seeing that start to take shape in terms of which models are being built, which models are being used, particularly for cases where efficiency of compute and cost are paramount for a given use case.
[00:28:28] Jamie De Guerre:
Yeah. I love this question, and it gets me excited because I think this is a core area of really research contribution that that Together dot ai has made. And one of our founders, Chris Ray, leads a leading lab at Stanford called Hazy Research. And a lot of these new techniques have come out of his lab at Hazy Research and some of his team. And I think it's such early days. You know, the transformer has achieved incredible things with the ability to do these huge models at scale and achieve tremendous qualities in terms of the accuracy that they're able to achieve on tasks.
But I think it is early days, and we're gonna see more and more efficient architectures for models over time that get used to to improve the compute time characteristics at either training or inference that achieves those same quality levels. One of the areas of research has been around using a different type of model architecture called state space models for these generative AI models. State space models weren't traditionally used for generative AI models, but there's techniques to use a state space model which has, you know, transformers have a quadratic performance characteristic. State space models are sub quadratic, near linear performance characteristic, which is dramatic difference in the compute that is needed to to do things like attention. And so there's model techniques like Hyena and Mamba that use state space models to enable much better performance characteristics and much larger long context to, we've shown, have been able to achieve roughly on par quality and accuracy of the end model output. And increasingly, some of the big models are starting to use these techniques. You don't have to necessarily use that for the entire model. You can use a bit of a hybrid architecture where you have parts of what the model does and its behavior using transformers and and parts that use a HANA or Mamba or state space model technique. Another technique that's come up more recently is actually from one of the advisors of Together dot ai that that has now started a new startup for using diffusion based model architectures for large language models. So text to image generation models like Stable Diffusion and Black Forest Labs, Flux models, and other things are typically using diffusion based models. But what this research has shown is you can use a diffusion based technique to create an LLM, and it achieves dramatically better performance characteristics. And so I don't know what the future model architecture will be, but I suspect it won't be a traditional transformer. And we'll have new architectures that enable us to train more efficiently and run these models at inference time more efficiently, which is really exciting because it then further lowers the cost and makes this technology even more accessible.
[00:31:15] Tobias Macey:
And then moving on to the complexities of inference time compute, obviously, that's your whole business. You have a lot of domain knowledge around that. You also mentioned that part of the evolution of organizations moving from proprietary to open models. They will typically try to run their own inference using something like VLLM or SG Lang or one of the other inference engines that are available. And I'm curious, what are some of the sharp corners, some of the challenges that organizations run into trying to self host inference time compute and some of the reasons that they decide to offload that onto a platform like Together.ai?
[00:31:57] Jamie De Guerre:
Yeah. This is kinda one of those things where there's, like, a really clear tactical tip of the iceberg that's sort of above above the water level that you can see and expect. And then there's this massive rest of the iceberg that that is kind of under the surface and a little bit more fuzzy to describe. But above the surface, I think that these models, particularly when it's a large model, are very costly to run. They require significant amounts of, you know, high end GPU compute, and and that is quite expensive to run and operate. You can see that even from the start when you're starting to build one of these applications. The second thing that's sort of above the water level is that load can be unpredictable for these, you know, in any Internet service, any Internet scale scale service. And there's lots of knowledge and dealing with, you know, dynamic load in the operations space of of these Internet applications. But I think that you go from needing to deal with varying load characteristics on a CPU application where the CPUs are commodities and incredibly cheap to elastically scale to doing so on a GPU level application where you can't get really on demand GPUs added to your application where you can't get really on demand GPUs added to your cluster because on demand is always sold out. And there's no elastic compute for for GPUs typically in these clouds. And it's a lot more challenging now to then deal with how you deal with these load characteristics, how you plan for the peaks. And if you plan for the if your peak is eight times your normal load, are you paying for eight times the compute a % of the time, which is awful and tremendously inefficient and expensive? And so if you don't have other jobs to be done on those GPUs, how do you get more efficient use of of that compute? So these are kind of, like, really hard and challenging things, but they're kind of expected and predictable as you start to think about some of this and you as you start to run into these issues. I think what makes it so much more difficult is the things that are a little less expected, which is the pace of change fundamentally because the techniques to run these models, the techniques to run them efficiently are changing so rapidly.
You know, there's a new model coming out every month, and so you're operating one model in production, and then your team that's kind of doing the post training and and such says, oh, we've we've switched. We've switched from this, you know, 8,000,000,000 parameter model, now the 23,000,000,000 parameter model from another, company with a completely different model architecture is achieving way better gains than the application. And the user engagement went through the roof. There's four times the user request when we experiment with this other model. So now we wanna roll this out to production. It's like, oh my goodness. Well, this is much more expensive and difficult to run. It doesn't work with the inference engine that we had yet because it doesn't support this new architecture that this 23,000,000,000 parameter model has. And you said that the engagement went up by four times with users, so you want me to have four times the capacity on this model that's, you know, three times the size? That just explodes the complexity and the challenge. And then other techniques come in like disaggregated serving, where, you know, the two two main phases of inference are prefill and decoding where you're kind of loading in the the input of the request and mapping it to the weights of the model during prefill, and then you're generating the response during decode phase. And, traditionally, this was all done on, like, a single node. But today, that's done on separate clusters that you scale independently with different numbers of GPUs.
And developing the system to be able to do that is challenging from a development perspective. And then from an operational perspective, it brings even more complexity as well. And so this is all, like, everything I just mentioned probably could have happened in the last, like, six months maybe. And so it moves very, very, very quickly. And so it's not only being able to plan for operation at a point in time of technology. It's planning for operation at this dynamically moving pace of AI and the AI space. And I think that that is the, you know, the rest of the iceberg that's under the water that makes it so much more challenging than you would expect when you kinda go in eyes open thinking about the expected challenges.
[00:36:14] Tobias Macey:
And and you the experience of the Together AI team and the requirements around being able to fulfill those use cases and scale those operational characteristics across multiple different users with an even higher degree of variability in terms of usage and model selection. What are some of the most significant technical and strategic investments and challenges that you've had to overcome, particularly in the past six to twelve months given that that the rate of change that we're dealing with?
[00:36:45] Jamie De Guerre:
Yeah. And I think that that's a great leading because I do think that that is the biggest challenge is this sort of pace of change. And I mentioned some of the aspects of that pace of change in terms of new models, new techniques for the inference engines. There's other parts of that as well, though, like new infrastructure, the new versions of GPUs coming up from NVIDIA and potentially others with new architectures, different sizes of memory that, you know, require totally new optimization and and techniques for how you operate, new storage infrastructure with, you know, GPU direct storage being available and other things and new techniques for how to leverage that storage across an inference cluster. The networking is changing rapidly.
And so I think that, you know, for me as I'm not the engineering leader or research leader, I'm on the, you know, the product side. There's an organizational challenge to how to kind of build and prepare and and have process for how we are setting ourselves up to be able to move the fastest. One of the things we've built recently actually is interesting is, you know, we try and build kernels that provide the fastest performance. So, you know, an attention kernel that can can do the attention calculations faster than NVIDIA's kernel, for example. But we've really been building this into our organization to not only build something that's faster, but build the ability to build it faster. So we wanna be able to make new kernels very quickly. And an example of this recently was NVIDIA's been working on Blackwell for at least a year or more.
And internally NVIDIA, they've they've obviously had access to the plans for Blackwell instruction sets and whatever other things that there might be. We got our first Blackwell chips. And within a week of access to those, we released a new kernel that was outperforming the main kernel from from NVIDIA for for the attention calculation, not by a huge margin, by, like, 2% or 3%, but was done in a week. And, this is because we've developed this harness, this platform for building new kernels quickly. And so I I think a huge part of what we've focused on at Together Ai is not just the ability to make things fast, but the ability to adjust and move to something new really quickly. In our human processes and our harnesses and techniques that we have for doing it, our inference engine is the same. You know, the the inference engine from some other organizations is written in native code in c and c plus plus and it ekes out the most performance in many respects. But when a new architecture comes out for a totally new model, it takes the organization, like, a month or more to support it. We need to have support for a new model in production, like, day one when it comes out, usually in hours of when it comes out. So we built our inference engine mostly in PyTorch and some components in in Rust and built it to be able to move to new architectures really, really quickly when they come out. We had DeepSeek R1 running in production in twenty four hours, for example. And so this is this is a key part of, I would say, the challenge that we've we've really had to focus on at TogetherAI.
[00:39:52] Tobias Macey:
The model architecture question is another interesting one too because you see from all of these various tool providers of, oh, hey. Here's my new release, and the release notes is, hey. We added support for Google Gemma or, be or, you know, hey. We added support for DeepSeek r one. And then also looking at some of the desktop tools. So LM Studio is one that I use fairly regularly, and it says, oh, well, here's the g d u f encoded model or, you know, an OLAWA that needs to be, you know, a certain distillation or a certain quantization to be able to run. And I'm curious how you're also seeing that impact the complexity of model selection and runtime selection, particularly for nonproduction use cases of, hey. I just wanna be able to use a local model to be able to chat with my documents on my laptop, or I wanna be able to deploy a model to a Kubernetes environment so that I can use that as a private copilot or something like that.
What are some of the main aspects of that model architecture that teams should be aware of and ways that they should get that familiarity to be able to understand what are the distinctions when they're doing that model selection?
[00:41:02] Jamie De Guerre:
Yeah. I think this speaks to, like, just all the layers of the of the complexity that it's not as simple of, you know, is it a transformer or or or or is it a a a different model architecture? Even within something like transformers, there's a huge variation in the, techniques that can be used, and the architecture that the model might be used using. And then beyond just the the core model architecture, there's techniques that are applied to to the model like quantization that make it more efficient to run, but might have, an impact on the the quality of the model and the accuracy that you receive. There's techniques like distillation and and many others. And, you know, I think that today, keeping on top of all of those is pretty necessary to build these applications well at scale.
And doing that yourself, you know, as an engineer or an engineering leader or or even a researcher is is very challenging. So so starting to work with an organization that can help you and support you on that, and they an organization that makes that their full focus and purpose in life and they've built teams around can be immensely helpful. I think also that's why, again, I I really feel that a lot of the evolution we're gonna see in the next year or more will be in these platforms becoming more and more of a system that will reduce how much of that complexity you have to deal with as an individual developer or or researcher so that it supports these variations and automatically helps you to optimize within them without having to go quite as deep into understanding and and building for each of them.
[00:42:50] Tobias Macey:
And in your experience of building and growing the Together.ai platform and working with your community of customers and the open model community, what are some of the most interesting or innovative or unexpected ways that you've seen either open models or Together.ai or that combination applied?
[00:43:11] Jamie De Guerre:
Great question. Yeah. I I think that's one of these things that's so exciting about working at this level of the stack is that the applications are are kind of endless with this technology. So we have companies that are building biotech models for new drug research on our platform. We have health sciences applications using existing models to analyze ultrasounds or x rays through through post training and fine tuning and achieving these incredible accuracy rates that are helping hospitals and and doctors get dramatic efficiency and quality gains in their work. We have companies building text to video models like Pika, where you get these incredible consumer applications and really fun, silly kind of videos out of a simple prompt really quickly that are that are really engaging. And then I think finally the one of the biggest surprises, now it feels not surprising at all, but I think, you know, two years ago, we wouldn't have expected it, is is coding. You know, these models have just been tremendous at coding.
We have multiple companies that are building integrated development environments for customers that use our platform for running a lot of the inference for these these coding applications. And the productivity that we're getting to for our engineers and the amount that these models can take care of for for coding is is just tremendous. And so all of these have been, you know, surprising and shocking over the last two years.
[00:44:39] Tobias Macey:
And in your own personal experience of working in this space, trying to stay up to date with everything that's happening, understand what direction to product, and how to manage the messaging around it to your potential customers? What are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:44:59] Jamie De Guerre:
Yeah. Great question. I think that one that I would say is, you know, no change is durable. You you can be so excited about one improvement that's gonna be made or one model that's gonna be released that's gonna be the next great thing. But you have to expect that it's going to be so quickly competed with with something that is going to be even better, whether even if it's from, like, an open community and not, like, an actual another company or whatever else. But just starting to change your frame of mind that this is it's really not about about building one solid point in time release that is going to, you know, last for a year or something until the next release.
I think that that's something that it has to be much more of a living constant evolution with tons of these improvements happening, and you have to be very fluid in adjusting to what's happening from the industry and and others in this space to be nimble and adjust to that in your in your roadmap and your plans. And I think this is the same for organizations adopting generative AI for their own use. Don't think of what you're building as you're gonna build a model that is tuned to your your needs and achieving the accuracy you want, and you're done for the next year. It's in production for your customers. Really think of it as you're building a muscle. You're building this ability to be on the latest, constantly evolving and iterating. And, you know, you should be getting to the point where new iterations of your model are coming up very regularly, whatever that regularity is. Maybe it's a month, maybe it's a week, maybe it's an hour eventually on an automated harness that's just getting e baled. I think that that is much more the way that you have to think about this space because it's still early days, and there's constant innovation and improvement coming from all sides.
[00:46:53] Tobias Macey:
And are there any other aspects of the work that you're doing at Together.ai, the overall space of open models, inference time compute, the challenges of building and fine tuning these foundation models, or just anything else about the work that you're doing that we didn't discuss yet that you'd like to cover before we close out the show? You know, I think the last thing I would just say is we're tremendously excited for what's happening in the open community.
[00:47:19] Jamie De Guerre:
And the pace at which open source or OpenWays models have caught up to the biggest best closed models is much faster than I thought it would happen. And we founded our company on the thesis that this would happen. So we're I was one of the most optimistic people that thought this would happen, but it it's much faster than I thought. It has shocked us how quickly new models have come out that are really achieving the same quality as the leading closed source labs. And, you know, one of those releases and it's shocking. Right? It's like, you know, DeepSeek r one release. And it was a tremendous gain above and beyond what had ever been achieved in an open source model, achieving the same as o one or o three in in many respects on a lot of the accuracy measures. And within three months, you know, two other organizations release models at the same level, as open source models as well. And so you have this democratization that's happening of the ability to provide these models and and leverage them in in new ways in the open community that I think is a great direction for AI for us to better understand these systems and be able to leverage them more deeply and know how to invest in applications for them in our organizations.
And it shocked me and really exciting how how quickly it's moving.
[00:48:35] Tobias Macey:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you and the rest of the Together AI team are doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gaps in the tooling technology or human training that's available for AI systems today.
[00:48:55] Jamie De Guerre:
The biggest gaps. I think that the biggest gaps are you know, it goes back to a lot of what we talked about today. But so much of the way we work today is still being thought about in terms of building for one model and optimizing for one model. And with the pace of improvements to models and new iterations of models being so rapid, I think that that is a gap that's gonna need to shift where better tooling comes out to help you be able to constantly evaluate different versions of models and automatically tune multiple models and get help with from the system, from the AI system, get help with figuring out which combination of models achieves the best outcome for your application. I I think that that is the next sort of iteration that's that's needed to make building and developing these applications on AI much more robust and easier to to manage.
[00:49:51] Tobias Macey:
Alright. Well, thank you very much for taking the time today to join me and share the work that you're doing at Together.ai and your thoughts and experiences in the open model ecosystem. It's definitely a very interesting and, obviously, very fast moving space, so I appreciate you taking the time to share your thoughts and opinions and expertise and help all of us, figure out a little bit more about what we should be doing. So thank you for that, and I hope you enjoy the rest of your day.
[00:50:16] Jamie De Guerre:
Absolutely, Tobias. Thank you so much. Really great questions, and I I really enjoyed this. Thank
[00:50:24] Tobias Macey:
you.
[00:50:25] Tobias Macey:
Thank you for listening. And don't forget to check out our other shows, the Data Engineering Podcast, which covers the latest in modern data management, and podcast.in it, which covers the Python language, its community, and the innovative ways it is being used. You can visit the site at the machinelearningpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts at themachinelearningpodcast dot com with your story. To help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Hello, and welcome to the AI Engineering podcast, your guide to the fast moving world of building scalable and maintainable AI systems. Your host is Tobias Macy, and today, I'm interviewing Jamie Daguerre about the role of open models in the AI economy and how to operate them at speed and at scale. So, Jamie, can you start by introducing yourself?
[00:00:29] Jamie De Guerre:
Sure. Hi, Tobias. Thank you so much for having me on the podcast today. My name is Jamie DeGeer. I was founding SVP of product at Together dot ai, where I've been for about two and a half years now. Joined right around the founding of the company. And prior to Together, I spent ten years working in startups in sort of similar roles, leading product management, marketing, and field technical services. And then the second startup I was at was acquired by Apple, and I spent nine years at Apple, first leading the program management team for search instead of the engineering organization, and then leading product marketing for AI and machine learning across Apple for about five years before I came to Together.
[00:01:10] Tobias Macey:
And do you remember how you first got started working in the ML and AI space and why you've spent so much of your career there?
[00:01:17] Jamie De Guerre:
Yeah. You know, I I first started working on machine learning in earnest, when we worked on search. We'd done some experimentation with that before then at, Cloudmark when we were doing email and SMS security applications. We were using some machine learning techniques in in sort of our research, but we didn't actually use them much in production. But then Topsy, we built a social search platform, and we used machine learning quite extensively for things like search relevance. And after Topsy was acquired by Apple, that continued and expanded. We used machine learning for lots of different parts of the search stack on the back end. And the search technology quickly expanded to be used across other applications at Apple, like helping with Siri and improving the understanding of your requests in Siri. And as I continued to work at Apple across AI machine learning in other areas, I worked on features like predicting which app you wanted next on your iPhone based on your routines and and many others.
[00:02:21] Tobias Macey:
So bringing us now to your current position, your current focus, I'm wondering if you can just give a bit of an overview about what it is that you're building at Together dot ai and some of the story behind how it got started and the problems that you're trying to solve there.
[00:02:35] Jamie De Guerre:
Yeah. Absolutely. So at Together dot ai, we're we're really providing a full AI cloud stack, and we do that with a focus on performance and efficiency to help customers achieve the best cost performance of whether they're building a model or customizing a model through fine tuning or post training or running a model for their production applications for inference. And so we call that the AI acceleration cloud because of our focus on speed and efficiency. And, we do this for developers. We now have over 550,000 AI developers using our platform for for these types of applications.
We also do this for foundation model labs. So not the, you know, OpenAI and Anthropic, but sort of a next tier down from that. We service a lot of leading foundation model labs, whether it be Salesforce Research, companies like Mistral, and and many others. And we're starting to also now do this for large enterprises. As they reach scale with running generative AI and production, our performance and efficiency becomes a huge advantage. And so we have customers like Zoom that uses our platform to train models that analyze conversations in a Zoom call and allow you to enable an AI assistant to chat about the history of those companies like Salesforce, which powers large parts of Agent Force, using our platform. And how we got started, the the the we were we were started in June of twenty twenty two.
This was after GPT three had launched, but before ChatGPT had come out. And we have a very deep research heritage in our founding, and we found that a lot of the AI community was really exacerbated with the fact that there was finally massive improvement happening in the quality of AI models, but suddenly it was being done in a closed way. And it wasn't done in the open with with sharing the techniques. OpenAI wasn't publishing how they had built GPT three. And this was really frustrating for the AI community because everyone had always, over the last fifty years of AI improvements, done so out in the open so that researchers could improve on past research. And the sort of feeling in a lot of the tech world was and the message was that, you know, we would have one big AI model, and it would become, you know, artificial general intelligence. It would be only one model would ever do that, and that was the end. You know, that was the way it was gonna be. And we just thought this was such a fallacy. We thought that there would be many models from many organizations.
And importantly, if we could help to spur the open source movement around AI, a lot of those would be open source. And when you look at major technology platforms over the last, you know, twenty years, open source becomes a huge part of those. You know, the Internet runs on open source. All of the servers run on open source Linux, on open source web servers, on open source databases, and other things. And we thought that that would be really important for this new technology movement of AI to also have foundations and openness so that we can understand how these models behave, how to build applications that deeply understand the technology under them and build for that appropriately, and, you know, so that society as well understands that. So that was what we felt the future should be, and we thought we could have a a small part in, like, helping to shape that and spur that as a movement. And if you imagine that future where there's lots of great models from lots of sources, many of them are open source, to some extent, the models become a little bit more of a commodity. Obviously, you know, great closed source models will still have huge success, but organizations will have the option to choose great open source models as well. And if the models are becoming a little bit more of a commodity, where does the value accrue? And we think that because it is so much more expensive to build and run a generative AI application compared to building and running a database and web server application, a lot of that value will actually accrue to the infrastructure.
And so if we can be the best at making it more efficient to use the infrastructure and providing faster performance out of the same infrastructure, then a lot of that value could potentially accrue to us. And so that's really the kernel of how we started TogetherAI with this focus on providing an AI acceleration cloud that provides the best cost and performance for just about any model. And we now have, you know, over 200 leading open source models available through the platform.
[00:07:06] Tobias Macey:
On that note of open source and openness in the model ecosystem, there's been a lot of debate about what that means semantically, what it means from a legal and licensing perspective. And before we get too far into that, there's also the element of who is producing these open models where the initial releases of open models were largely as a reaction to the open AIs and anthropics where they have these frontier models that are proprietary and closed. You can only access them via the API. There's no real insight into how they're built, how they're operating, and you just have to rely on their infrastructure if you wanted to build on top of it, which carries with it a lot of platform risk. And so the initial set of models that were released as open source, again, delaying a little bit on the semantics of that argument, those were mostly coming from the big tech companies. So the LAMA models from Meta were some of the initial ones. There were the Mistrala models that came out as a response there as well. And then also in recent months, Google has been releasing some fairly popular open models. And I'm wondering from that initial phase of two years ago when we were really starting to get these open models released and starting to gain some adoption, how has the ecosystem shifted, and what do you see as the distribution of who is producing those models compared to which ones are actually being used and leveraged?
[00:08:40] Jamie De Guerre:
Yeah. No. That's a great question. You know, I think what is so fascinating about what's happening today in the AI industry is there's innovation at many different levels. And traditionally, like, the way we thought about things, the the the way that the messaging came out of these big labs was that all of the value of creating this intelligence came from massive pre pre training. And so pre training is really taking, like, all of the Internet's data or all of the private data in the world that you can collect and building a very, very large model, a large base model from nothing, starting from random weights. And it is expensive. You know, the the the the messaging from a lot of the big labs was that, you know, it costs billions of dollars to build one of these models.
And to some extent, you know, that's true of this pre training process. It really is very expensive. Probably not billions of dollars today, but it's certainly not something that just anyone can do. You need to have really large AI compute factories to to be doing that kind of training. But then a lot of gains started to happen from post training techniques. Taking an already built pre trained model, a base model, fine tuning it, doing, reinforcement learning from human feedback, reinforcement learning from verifiable outcomes, which is a newer technique, and lots of other techniques in the whole post training process to modify this big base model to do new things or to improve it in some way. And this achieved tremendous gains. It wasn't what the huge organizations would wanna focus on to tell you is the way they're achieving those gains because it's actually very, very inexpensive and easy to repeat and iterate in lots of different ways. And then more recently, we're starting to use techniques like inference time compute or test time compute to get these models to do more reasoning and sort of think while they're outputting an answer. And this, again, doesn't really require large training processes to get a model to start to behave that way. And so I think what you've seen in the open source community is that there are a number of very well funded, quite large organizations that are investing in building really, really high quality and, you know, large pretrained models. This includes Meta with the llama models. This includes large organizations like Google and Microsoft.
It also includes not for profit organizations that are well funded, that are helping to do this, and some increasingly, a number of startups, startups like Mistral, which was, you know, releasing state of the art models with a team of 30 people through real innovation at doing this really effectively. And so they they create these large pre trained models just like the big labs do. But what's different is once that pre trained model gets released, the open community rapidly improves them through other techniques. And so you get literally thousands of different versions of the llama models created by the open community and shared openly through platforms like Hugging Face.
And each one starts to be better in different ways, and each one enables the next one to learn from their learnings. And this has enabled both higher quality in new ways, but new techniques that have enabled organizations to do this more efficiently and easily, which which has been a huge advancement that comes from that open source movement that the big labs are now copying, things like LoRa fine tunes and other things.
[00:12:08] Tobias Macey:
Delaying again the question about the semantics of open and open source, I think it's worth digging into your point of the proliferation of these different fine tuned models, distillations. The the number of models has been growing faster and faster, which also leads to a lot of confusion and challenge as far as which model do I use for problem x and the challenge of figuring out based on what you're trying to achieve, how do I even start to sift through these different models and understand the implications of how they're going to perform on my particular problem set versus, oh, I there are too many options. I'm just going to go with one of the big model providers and use that, or maybe I start with the big model and do my own distillation, which then compounds the problem further. And I'm curious how you're seeing people address that challenge of the paradox of choice that we're faced with now.
[00:13:01] Jamie De Guerre:
Yeah. I I think that that is super challenging, and it speaks to kind of you know, the the the biggest challenge in general working in this space is just the pace of change. Things are moving so quickly with new models, new techniques that it's very, very difficult to keep up and know exactly which techniques to use or which models to use or other things. And I think that this is one of the key areas we're gonna see improve in the coming couple of years, maybe the coming, several months is increasingly the providers are gonna provide an AI system, an AI platform instead of just a model. You already are seeing this today with with OpenAI, for example.
Their latest model releases are models that are an AI system built to use data that they pull in through a RAG, built to use tools as part of the tool chain of every request and, you know, compile code and look at the output of the code or access a mathematics calculator or other things. And and I think that this trend will continue and will start to move to a point where you obfuscate out which model gets used. The AI system will actually choose the model for the task, both at a request level and on a level of helping an organization say, okay. Let's say, for example, that you are a insurance company, and you've got a model that is intended to interact with customers around insurance claims, and you've done some work to fine tune one model. Let's say you started with Google's Gemma model. You fine tuned that. That model will get used in production, but it will get monitored and evaluated in an automatic way on an ongoing basis. And as new models come out, let's say, Llama just released Better just released the Maverick, Llama four model, that fine tuning data will automatically be applied to the Maverick model. You'll have a new candidate model that is now available in this production AI system. A subset of requests will sort of be sent to the new fine tuned model in sort of a shadow mode and evaluated. And then if it's achieving higher accuracy, either through prompting the administrator or developer running this system to approve starting to use it based on the evals or eventually automatically, the AI system will actually switch the models. And so you'll you'll start to have this system that is constantly creating and evaluating new candidate models for you based on your fine tuning data, based on your production traffic, based on new models that are coming up from the community, and just greenlighting the one that's best for the task and acting as a total platform and system as opposed to saying a specific model.
[00:15:33] Tobias Macey:
And now exploring that complexity of the terminology being used around these open models as far as the first set of them were very, freely called open source. And then there was a lot of pushback saying, no. It's not actually open source because all we have is the model and the weights. And maybe we have the code, maybe we don't. Most of them, you don't have the data. And so now there's been enough iteration that there is actually an OSI approved definition of open source in the context of these AI systems. And, again, there's variety of levels of compliance with that as far as the models that are out there. I'm wondering how you're seeing the current language used around the idea of openness and the generalized understanding of what those different terms mean and the context of what types of rights and freedoms are being granted to the users of those models.
[00:16:26] Jamie De Guerre:
Yeah. I think it's a great point. I think that there's a lot of variety here, and, you know, this is not new to the open source community. The open source community has grappled with this since the very beginning in terms of, you know, different licenses, whether it was, you know, the MIT license or an Apache license or, licenses that were not as, legally permissive for different use cases. And there's a new layer of complexity with this when it comes to the AI models as you're pointing out because the model could be open weights, meaning that the weights are available for anyone to use with a certain license attached to it. And you can download them and you can use them on your own system. But the source code of how the model was created might not be available. And so those are sort of referred to as open weights models, but not open source, technically speaking. The other aspect is that, like you mentioned, the data that the model is trained on. You know, a fully open source model release should really include something that you can reproduce.
And if you can you can reproduce it, you would need the code that was used to train it. You would need the resulting weights that were outputted, but you'd also need the data that was that was used to train the model. And so that's sort of an open data model. I think it's helpful that we create these distinctions. I think that it is helpful that different organizations do this at different levels. You know, some organizations do all three, which really helps the research community learn even more from the release of what's being put out. But I don't think that we should think about this in terms of, you know, if an organization doesn't release the source and does not release the data, then it's not a truly good open release. It's not helpful. That's definitely not the case. Just having an open weights model is tremendously helpful to the community, and it enables so much of an ecosystem to occur around that. The meta Meta's llama models are just open weights models. Their source code is not released. The data training data is not released. But they they still enable a tremendous ecosystem to build on top of those models and use those models. And so I think that all of these levels are are valuable, and it's great for there to be a variety from the open community.
[00:18:34] Tobias Macey:
From the perspective of organizations who are investing in the adoption and use of AI for their business systems, whether that's internal or customer facing. What do you see as the typical phases of evolution from the initial set of, hey. I've played with OpenAI. This seems great. Oh, wait. I don't wanna take on the platform risk of some proprietary company deciding that they wanna yank out some feature from under me. So now I'm gonna go to these open models up through to, hey. I've built this self evolving AI system. It's doing the what you were saying before as far as the automatic fine tuning, automatic candidate generation, automatic shadowing. Obviously, there are a lot of steps and a lot of layers of complexity on that path.
And then even eventually to the point of, hey. I've got enough of my own source data. I'm going to build my own foundation model from scratch. And I'm wondering how you're seeing organizations tackle that great that that evolution of complexity and sophistication on that path.
[00:19:34] Jamie De Guerre:
I think that there's kind of two vectors of this. One is on the achieving model quality, and then the other is on dealing with operational scale and performance and and hosting. I think on the achieving model quality, I think what you mentioned is very typical where an organization will start with a basic prototype using a model like OpenAI or Anthropic Cloud. Very easy to get started. These models tend to be of the highest quality out of the box, and you can quickly prototype to see if the application makes sense and if AI can be leveraged for the task.
Once you see that it can be, a lot of these organizations run into challenges. One of the challenges is, as you mentioned, being beholden to a single vendor and single platform, and it could change under you. Another is the cost. You know, if you're starting with the the biggest model, it's often the the highest cost. Another is the performance and scalability and reliability. It's one thing to prototype for, you know, a thousand employees. It's another thing to run it out roll it out to millions of consumers. And so all of these reasons often lead them then to saying, okay. I want to invest in tuning an open source model to my application and be able to operate this with more control and ownership at higher skill and lower cost. And so they switch to open source models. And I think on the accuracy vector, that usually starts just with prompt engineering in a similar way to how they probably started with the closed source model. Then the next phase is is typically adding data to the prompt through RAG with retrieval augmented generation so that you can imbue knowledge from your internal enterprise into the context that the model uses to respond to a request.
And in many cases, that is all you need to kind of get a successful application. In many cases, organizations go further to do a combination of RAG and post training or fine tuning of the model, which can help to adjust the behavior of the model or also imbue some knowledge into the model at the time of the training. And we usually see, like, for the most advanced deployments, using a combination of both is the most successful. I think that the next stage we'll start to see more and more is sort of a regular pipeline for that constant improvement. Today, that's kind of done through human methodologies of, like, research teams having a set of sprints where they kind of retrain and and experiment. But increasingly, they will we'll see the AI platforms help customers do that more and more easily. I think really quickly, the other vector really is around the operational performance and scalability. And I think often we see organizations when they first start to switch to to open source, they'll self host on maybe their their existing cloud provider, like AWS EC two instances or something along those lines using an open source inference system like, say, VLLM or SG Lang and operating it on their own. And this is a great way to get started using these open source models. But as they start to deploy into production, that's a typical time when they they come to us and say, like, this was tougher to operate at scale than we expected. There's lots of issues that are that are challenging that we haven't developed approaches for yet, and the overall cost is really, really high. And by switching to a platform like Together.ai, we take care of all that operational burden for them and are usually able to give them better efficiency where we reduce the amount of GPU compute needed by half or a third or or or sometimes even more, reducing that cost and improving that total cost of ownership.
[00:23:04] Tobias Macey:
And as far as that transition point from I have moved from the proprietary model. I'm now using open models. I've got my RAG workflow in place, and everything is working to then saying, okay. Now I'm going to take the next step into actually fine tuning my own models or even progressing further to building, your own foundation model. What are some of the heuristics that you're seeing organizations make as far as whether and when they actually want to take that next step beyond just being consumers of the existing off the shelf models and into actually investing in their own capacity and capability to generate new models either from base models or from or from whole cloth?
[00:23:47] Jamie De Guerre:
Yeah. That's a good question. I mean, I think first is kind of a organization level, strategic level heuristic, not a technical one, which is sort of how strategically important is our investment in generative AI. And for a lot of organizations, this is a huge strategic imperative. You know, an insurance company may view it as existential that they become the leader at leveraging generative AI most effectively to help automate the process of making the best predictions on whether or not to insure someone or the cost to insure someone on or a location to insure in or not and other things. Because if they don't become the best at that quickly, their competitors will and they will be at a tremendous competitive disadvantage and and sort of lose the market. You know, that's kind of stated in an extreme way, but I think that that is a reality for some of the impact that generative AI is gonna have on so many industries.
And so for this sort of first heuristic, that question becomes like, is this critically important to our organization that we strategically invest in generative AI and start to create the muscle to be great at leveraging generative AI amongst our internal organization. And if it is that strategically important, they want to get good at doing that process. They want to own the result of their investment in generative AI and control it. They want to be able to repeat that process quickly when new models come out. And, often that then leads to wanting to fine tune and post train a model that they own the resulting model and control it and that they can have the the the muscle, the team, the staff, etcetera, to be able to repeat that process. I think the second heuristic is a technical one, which comes down to what type of change in model behavior are we trying to create. If the change is solely kind of information based or knowledge based, like this model on its own doesn't know about our products or our internal process documentation, then then RAG is very well suited, obviously. You you can pull that information as long as that information quantity is small enough that can fit into the context of a request and you have a good enough search capability to find the right documents. You can pull that in before the model actually processes the request and imbue that knowledge at at request time. But if the amount of that knowledge goes beyond sort of what you can put into context or you're trying to teach the model sort of new reasoning technique or get the model to kind of understand broadly how your whole industry functions or something of that nature. Let's say it's like weather prediction or, you know, insurance prediction could be a good one. Like, that's not just like you need to have the weather data for the past two years in this location. You actually wanted to understand the impact of weather changes on insurability.
That's kind of imbuing new reasoning capability. Or if you're just trying to change the behavior of how the model behaves in terms of how it communicates and how it outputs. Sometimes that can be done with simple prompt engineering, but sometimes post training does it more effectively. And so this sort of heuristic of what type of change you're trying to create in the model behavior becomes the other main reason for when to decide whether or not to do post
[00:27:08] Tobias Macey:
training. Another major avenue of conversation that's happening both in the open and proprietary space, although, obviously, more transparently in the open models, is the type of architecture that's being employed for the building of these models, particularly when it comes to inference time efficiency where the transformers paper was what catapulted us into the current generation of generative AI that is the predominant architecture of most of the models that are in this space right now. But there has been a lot of conversation in recent months about alternative architectures that are better for either training compute or being able to run on other types of hardware that doesn't necessarily lock you into the NVIDIA ecosystem as well as the efficiencies at inference time or the ability to have a smaller number of weights that produce a, more outsized capability compared to the similarly weighted transformer models.
And I'm curious how you're seeing that start to take shape in terms of which models are being built, which models are being used, particularly for cases where efficiency of compute and cost are paramount for a given use case.
[00:28:28] Jamie De Guerre:
Yeah. I love this question, and it gets me excited because I think this is a core area of really research contribution that that Together dot ai has made. And one of our founders, Chris Ray, leads a leading lab at Stanford called Hazy Research. And a lot of these new techniques have come out of his lab at Hazy Research and some of his team. And I think it's such early days. You know, the transformer has achieved incredible things with the ability to do these huge models at scale and achieve tremendous qualities in terms of the accuracy that they're able to achieve on tasks.
But I think it is early days, and we're gonna see more and more efficient architectures for models over time that get used to to improve the compute time characteristics at either training or inference that achieves those same quality levels. One of the areas of research has been around using a different type of model architecture called state space models for these generative AI models. State space models weren't traditionally used for generative AI models, but there's techniques to use a state space model which has, you know, transformers have a quadratic performance characteristic. State space models are sub quadratic, near linear performance characteristic, which is dramatic difference in the compute that is needed to to do things like attention. And so there's model techniques like Hyena and Mamba that use state space models to enable much better performance characteristics and much larger long context to, we've shown, have been able to achieve roughly on par quality and accuracy of the end model output. And increasingly, some of the big models are starting to use these techniques. You don't have to necessarily use that for the entire model. You can use a bit of a hybrid architecture where you have parts of what the model does and its behavior using transformers and and parts that use a HANA or Mamba or state space model technique. Another technique that's come up more recently is actually from one of the advisors of Together dot ai that that has now started a new startup for using diffusion based model architectures for large language models. So text to image generation models like Stable Diffusion and Black Forest Labs, Flux models, and other things are typically using diffusion based models. But what this research has shown is you can use a diffusion based technique to create an LLM, and it achieves dramatically better performance characteristics. And so I don't know what the future model architecture will be, but I suspect it won't be a traditional transformer. And we'll have new architectures that enable us to train more efficiently and run these models at inference time more efficiently, which is really exciting because it then further lowers the cost and makes this technology even more accessible.
[00:31:15] Tobias Macey:
And then moving on to the complexities of inference time compute, obviously, that's your whole business. You have a lot of domain knowledge around that. You also mentioned that part of the evolution of organizations moving from proprietary to open models. They will typically try to run their own inference using something like VLLM or SG Lang or one of the other inference engines that are available. And I'm curious, what are some of the sharp corners, some of the challenges that organizations run into trying to self host inference time compute and some of the reasons that they decide to offload that onto a platform like Together.ai?
[00:31:57] Jamie De Guerre:
Yeah. This is kinda one of those things where there's, like, a really clear tactical tip of the iceberg that's sort of above above the water level that you can see and expect. And then there's this massive rest of the iceberg that that is kind of under the surface and a little bit more fuzzy to describe. But above the surface, I think that these models, particularly when it's a large model, are very costly to run. They require significant amounts of, you know, high end GPU compute, and and that is quite expensive to run and operate. You can see that even from the start when you're starting to build one of these applications. The second thing that's sort of above the water level is that load can be unpredictable for these, you know, in any Internet service, any Internet scale scale service. And there's lots of knowledge and dealing with, you know, dynamic load in the operations space of of these Internet applications. But I think that you go from needing to deal with varying load characteristics on a CPU application where the CPUs are commodities and incredibly cheap to elastically scale to doing so on a GPU level application where you can't get really on demand GPUs added to your application where you can't get really on demand GPUs added to your cluster because on demand is always sold out. And there's no elastic compute for for GPUs typically in these clouds. And it's a lot more challenging now to then deal with how you deal with these load characteristics, how you plan for the peaks. And if you plan for the if your peak is eight times your normal load, are you paying for eight times the compute a % of the time, which is awful and tremendously inefficient and expensive? And so if you don't have other jobs to be done on those GPUs, how do you get more efficient use of of that compute? So these are kind of, like, really hard and challenging things, but they're kind of expected and predictable as you start to think about some of this and you as you start to run into these issues. I think what makes it so much more difficult is the things that are a little less expected, which is the pace of change fundamentally because the techniques to run these models, the techniques to run them efficiently are changing so rapidly.
You know, there's a new model coming out every month, and so you're operating one model in production, and then your team that's kind of doing the post training and and such says, oh, we've we've switched. We've switched from this, you know, 8,000,000,000 parameter model, now the 23,000,000,000 parameter model from another, company with a completely different model architecture is achieving way better gains than the application. And the user engagement went through the roof. There's four times the user request when we experiment with this other model. So now we wanna roll this out to production. It's like, oh my goodness. Well, this is much more expensive and difficult to run. It doesn't work with the inference engine that we had yet because it doesn't support this new architecture that this 23,000,000,000 parameter model has. And you said that the engagement went up by four times with users, so you want me to have four times the capacity on this model that's, you know, three times the size? That just explodes the complexity and the challenge. And then other techniques come in like disaggregated serving, where, you know, the two two main phases of inference are prefill and decoding where you're kind of loading in the the input of the request and mapping it to the weights of the model during prefill, and then you're generating the response during decode phase. And, traditionally, this was all done on, like, a single node. But today, that's done on separate clusters that you scale independently with different numbers of GPUs.
And developing the system to be able to do that is challenging from a development perspective. And then from an operational perspective, it brings even more complexity as well. And so this is all, like, everything I just mentioned probably could have happened in the last, like, six months maybe. And so it moves very, very, very quickly. And so it's not only being able to plan for operation at a point in time of technology. It's planning for operation at this dynamically moving pace of AI and the AI space. And I think that that is the, you know, the rest of the iceberg that's under the water that makes it so much more challenging than you would expect when you kinda go in eyes open thinking about the expected challenges.
[00:36:14] Tobias Macey:
And and you the experience of the Together AI team and the requirements around being able to fulfill those use cases and scale those operational characteristics across multiple different users with an even higher degree of variability in terms of usage and model selection. What are some of the most significant technical and strategic investments and challenges that you've had to overcome, particularly in the past six to twelve months given that that the rate of change that we're dealing with?
[00:36:45] Jamie De Guerre:
Yeah. And I think that that's a great leading because I do think that that is the biggest challenge is this sort of pace of change. And I mentioned some of the aspects of that pace of change in terms of new models, new techniques for the inference engines. There's other parts of that as well, though, like new infrastructure, the new versions of GPUs coming up from NVIDIA and potentially others with new architectures, different sizes of memory that, you know, require totally new optimization and and techniques for how you operate, new storage infrastructure with, you know, GPU direct storage being available and other things and new techniques for how to leverage that storage across an inference cluster. The networking is changing rapidly.
And so I think that, you know, for me as I'm not the engineering leader or research leader, I'm on the, you know, the product side. There's an organizational challenge to how to kind of build and prepare and and have process for how we are setting ourselves up to be able to move the fastest. One of the things we've built recently actually is interesting is, you know, we try and build kernels that provide the fastest performance. So, you know, an attention kernel that can can do the attention calculations faster than NVIDIA's kernel, for example. But we've really been building this into our organization to not only build something that's faster, but build the ability to build it faster. So we wanna be able to make new kernels very quickly. And an example of this recently was NVIDIA's been working on Blackwell for at least a year or more.
And internally NVIDIA, they've they've obviously had access to the plans for Blackwell instruction sets and whatever other things that there might be. We got our first Blackwell chips. And within a week of access to those, we released a new kernel that was outperforming the main kernel from from NVIDIA for for the attention calculation, not by a huge margin, by, like, 2% or 3%, but was done in a week. And, this is because we've developed this harness, this platform for building new kernels quickly. And so I I think a huge part of what we've focused on at Together Ai is not just the ability to make things fast, but the ability to adjust and move to something new really quickly. In our human processes and our harnesses and techniques that we have for doing it, our inference engine is the same. You know, the the inference engine from some other organizations is written in native code in c and c plus plus and it ekes out the most performance in many respects. But when a new architecture comes out for a totally new model, it takes the organization, like, a month or more to support it. We need to have support for a new model in production, like, day one when it comes out, usually in hours of when it comes out. So we built our inference engine mostly in PyTorch and some components in in Rust and built it to be able to move to new architectures really, really quickly when they come out. We had DeepSeek R1 running in production in twenty four hours, for example. And so this is this is a key part of, I would say, the challenge that we've we've really had to focus on at TogetherAI.
[00:39:52] Tobias Macey:
The model architecture question is another interesting one too because you see from all of these various tool providers of, oh, hey. Here's my new release, and the release notes is, hey. We added support for Google Gemma or, be or, you know, hey. We added support for DeepSeek r one. And then also looking at some of the desktop tools. So LM Studio is one that I use fairly regularly, and it says, oh, well, here's the g d u f encoded model or, you know, an OLAWA that needs to be, you know, a certain distillation or a certain quantization to be able to run. And I'm curious how you're also seeing that impact the complexity of model selection and runtime selection, particularly for nonproduction use cases of, hey. I just wanna be able to use a local model to be able to chat with my documents on my laptop, or I wanna be able to deploy a model to a Kubernetes environment so that I can use that as a private copilot or something like that.
What are some of the main aspects of that model architecture that teams should be aware of and ways that they should get that familiarity to be able to understand what are the distinctions when they're doing that model selection?
[00:41:02] Jamie De Guerre:
Yeah. I think this speaks to, like, just all the layers of the of the complexity that it's not as simple of, you know, is it a transformer or or or or is it a a a different model architecture? Even within something like transformers, there's a huge variation in the, techniques that can be used, and the architecture that the model might be used using. And then beyond just the the core model architecture, there's techniques that are applied to to the model like quantization that make it more efficient to run, but might have, an impact on the the quality of the model and the accuracy that you receive. There's techniques like distillation and and many others. And, you know, I think that today, keeping on top of all of those is pretty necessary to build these applications well at scale.
And doing that yourself, you know, as an engineer or an engineering leader or or even a researcher is is very challenging. So so starting to work with an organization that can help you and support you on that, and they an organization that makes that their full focus and purpose in life and they've built teams around can be immensely helpful. I think also that's why, again, I I really feel that a lot of the evolution we're gonna see in the next year or more will be in these platforms becoming more and more of a system that will reduce how much of that complexity you have to deal with as an individual developer or or researcher so that it supports these variations and automatically helps you to optimize within them without having to go quite as deep into understanding and and building for each of them.
[00:42:50] Tobias Macey:
And in your experience of building and growing the Together.ai platform and working with your community of customers and the open model community, what are some of the most interesting or innovative or unexpected ways that you've seen either open models or Together.ai or that combination applied?
[00:43:11] Jamie De Guerre:
Great question. Yeah. I I think that's one of these things that's so exciting about working at this level of the stack is that the applications are are kind of endless with this technology. So we have companies that are building biotech models for new drug research on our platform. We have health sciences applications using existing models to analyze ultrasounds or x rays through through post training and fine tuning and achieving these incredible accuracy rates that are helping hospitals and and doctors get dramatic efficiency and quality gains in their work. We have companies building text to video models like Pika, where you get these incredible consumer applications and really fun, silly kind of videos out of a simple prompt really quickly that are that are really engaging. And then I think finally the one of the biggest surprises, now it feels not surprising at all, but I think, you know, two years ago, we wouldn't have expected it, is is coding. You know, these models have just been tremendous at coding.
We have multiple companies that are building integrated development environments for customers that use our platform for running a lot of the inference for these these coding applications. And the productivity that we're getting to for our engineers and the amount that these models can take care of for for coding is is just tremendous. And so all of these have been, you know, surprising and shocking over the last two years.
[00:44:39] Tobias Macey:
And in your own personal experience of working in this space, trying to stay up to date with everything that's happening, understand what direction to product, and how to manage the messaging around it to your potential customers? What are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:44:59] Jamie De Guerre:
Yeah. Great question. I think that one that I would say is, you know, no change is durable. You you can be so excited about one improvement that's gonna be made or one model that's gonna be released that's gonna be the next great thing. But you have to expect that it's going to be so quickly competed with with something that is going to be even better, whether even if it's from, like, an open community and not, like, an actual another company or whatever else. But just starting to change your frame of mind that this is it's really not about about building one solid point in time release that is going to, you know, last for a year or something until the next release.
I think that that's something that it has to be much more of a living constant evolution with tons of these improvements happening, and you have to be very fluid in adjusting to what's happening from the industry and and others in this space to be nimble and adjust to that in your in your roadmap and your plans. And I think this is the same for organizations adopting generative AI for their own use. Don't think of what you're building as you're gonna build a model that is tuned to your your needs and achieving the accuracy you want, and you're done for the next year. It's in production for your customers. Really think of it as you're building a muscle. You're building this ability to be on the latest, constantly evolving and iterating. And, you know, you should be getting to the point where new iterations of your model are coming up very regularly, whatever that regularity is. Maybe it's a month, maybe it's a week, maybe it's an hour eventually on an automated harness that's just getting e baled. I think that that is much more the way that you have to think about this space because it's still early days, and there's constant innovation and improvement coming from all sides.
[00:46:53] Tobias Macey:
And are there any other aspects of the work that you're doing at Together.ai, the overall space of open models, inference time compute, the challenges of building and fine tuning these foundation models, or just anything else about the work that you're doing that we didn't discuss yet that you'd like to cover before we close out the show? You know, I think the last thing I would just say is we're tremendously excited for what's happening in the open community.
[00:47:19] Jamie De Guerre:
And the pace at which open source or OpenWays models have caught up to the biggest best closed models is much faster than I thought it would happen. And we founded our company on the thesis that this would happen. So we're I was one of the most optimistic people that thought this would happen, but it it's much faster than I thought. It has shocked us how quickly new models have come out that are really achieving the same quality as the leading closed source labs. And, you know, one of those releases and it's shocking. Right? It's like, you know, DeepSeek r one release. And it was a tremendous gain above and beyond what had ever been achieved in an open source model, achieving the same as o one or o three in in many respects on a lot of the accuracy measures. And within three months, you know, two other organizations release models at the same level, as open source models as well. And so you have this democratization that's happening of the ability to provide these models and and leverage them in in new ways in the open community that I think is a great direction for AI for us to better understand these systems and be able to leverage them more deeply and know how to invest in applications for them in our organizations.
And it shocked me and really exciting how how quickly it's moving.
[00:48:35] Tobias Macey:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you and the rest of the Together AI team are doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gaps in the tooling technology or human training that's available for AI systems today.
[00:48:55] Jamie De Guerre:
The biggest gaps. I think that the biggest gaps are you know, it goes back to a lot of what we talked about today. But so much of the way we work today is still being thought about in terms of building for one model and optimizing for one model. And with the pace of improvements to models and new iterations of models being so rapid, I think that that is a gap that's gonna need to shift where better tooling comes out to help you be able to constantly evaluate different versions of models and automatically tune multiple models and get help with from the system, from the AI system, get help with figuring out which combination of models achieves the best outcome for your application. I I think that that is the next sort of iteration that's that's needed to make building and developing these applications on AI much more robust and easier to to manage.
[00:49:51] Tobias Macey:
Alright. Well, thank you very much for taking the time today to join me and share the work that you're doing at Together.ai and your thoughts and experiences in the open model ecosystem. It's definitely a very interesting and, obviously, very fast moving space, so I appreciate you taking the time to share your thoughts and opinions and expertise and help all of us, figure out a little bit more about what we should be doing. So thank you for that, and I hope you enjoy the rest of your day.
[00:50:16] Jamie De Guerre:
Absolutely, Tobias. Thank you so much. Really great questions, and I I really enjoyed this. Thank
[00:50:24] Tobias Macey:
you.
[00:50:25] Tobias Macey:
Thank you for listening. And don't forget to check out our other shows, the Data Engineering Podcast, which covers the latest in modern data management, and podcast.in it, which covers the Python language, its community, and the innovative ways it is being used. You can visit the site at the machinelearningpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts at themachinelearningpodcast dot com with your story. To help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Introduction to AI Engineering Podcast
Interview with Jamie Daguerre
Overview of Together.ai's Mission
The Importance of Open Models in AI
Evolution of Open Models Ecosystem
Challenges in Model Selection
Understanding Open Source in AI
Strategic Importance of Generative AI
Heuristics for Model Development
Exploring Model Architectures
Inference Time Compute Complexities
Technical Challenges and Innovations
Model Architecture and Selection
Innovative Applications of Open Models
Future Directions and Closing Remarks