In this episode of the AI Engineering Podcast Daniel Sodickson, Chief of Innovation in Radiology at NYU Grossman School of Medicine, talks about harnessing AI systems to truly understand images and revolutionize science and healthcare. Dan shares his journey from linear reconstruction to early deep learning for accelerated MRI, highlighting the importance of domain expertise when adapting models to specialized modalities. He explores "upstream" AI that changes what and how we measure, using physics-guided networks, prior knowledge, and personal baselines to enable faster, cheaper, and more accessible imaging. The conversation covers multimodal world models, cross-disciplinary translation, explainability, and a future where agents flag abnormalities while humans apply judgment, as well as provocative frontiers like "imaging without images," continuous health monitoring, and decoding brain activity. Dan stresses the need to preserve truth, context, and human oversight in AI-driven imaging, and calls for tools that distill core methodologies across disciplines to accelerate understanding and progress.
Announcements
- Hello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systems
- When ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models - they needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App rely on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated. Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows. But Prefect didn't stop there. They just launched FastMCP - production-ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing-fast Python execution. Deploy your AI tools once, connect to Claude, Cursor, or any MCP client. No more building auth flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and Fast MCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.
- Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most - building intelligent systems. Write Python code for your business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML/AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch. Build end-to-end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward-thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin, and for dbt Cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.
- Your host is Tobias Macey and today I'm interviewing Daniel Sodickson about the impact and applications of AI that is capable of image understanding
- Introduction
- How did you get involved in machine learning?
- Images and vision are concepts that we understand intuitively, but which have a large potential semantic range. How would you characterize the scope and application of imagery in the context of AI and other autonomous technologies?
- Can you give an overview of the current state of image/vision capabilities in AI systems?
- A predominant application of machine vision has been for object recognition/tracking. How are advances in AI changing the range of problems that can be solved with computer vision systems?
- A substantial amount of work has been done on processing of images such as the digital pictures taken by smartphones. As you move to other types of image data, particularly in non-visible light ranges, what are the areas of similarity and in what ways do we need to develop new processing/analysis techniques?
- What are some of the ways that AI systems will change the ways that we conceive of
- What are the most interesting, innovative, or unexpected ways that you have seen AI vision used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on imaging technologies and techniques?
- When is AI the wrong choice for vision/imaging applications?
- What are your predictions for the future of AI image understanding?
Parting Question
- From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?
- Thank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers.
- MRI == Magnetic Resonance Imaging
- Linear Algorithm
- Non-Linear Algorithm
- Compressed Sensing
- Dictionary Learning Algorithm
- Deep Learning
- CT Scan
- Cambrian Explosion
- LIDAR Point Cloud
- Synthetic Aperture Radar
- Geoffrey Hinton
- Co-Intelligence by Ethan Mollick (affiliate link)
- Tomography
- X-Ray Crystallography
- CERN
- CLIP Model
- Physics-Guided Neural Network
- Functional MRI
- A Path Toward Autonomous Machine Intelligence by Yann LeCun
Hello, and welcome to the AI Engineering podcast, your guide to the fast moving world of building scalable and maintainable AI systems. When ML teams try to run complex workflows through traditional orchestration tools, they hit walls. Cash App discovered this with their fraud detection models. They needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver. That's why Cash App relies on Prefect. Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks. Custom packages stay isolated.
Model outputs flow seamlessly between workflows. Companies like Whoop and 1Password also trust Prefect for their critical workflows, but Prefect didn't stop there. They just launched FastMCP, production ready infrastructure for AI tools. You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing fast Python execution. Deploy your AI tools once. Connect to Claude, Cursor, or any MCP client. No more building off flows or managing servers. Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure. See what Prefect and FastMCP can do for your AI workflows at aiengineeringpodcast.com/prefect today.
Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most, building intelligent systems. Write Python code for your business logic and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML and AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch.
Build end to end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin. And for DBT cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud. Your host is Tobias Macey, and today I'm interviewing Daniel Sodickson about the impact and applications of AI that is capable of image understanding. So, Daniel, can you start by introducing yourself?
[00:02:33] Daniel Sodickson:
Absolutely. Yeah. First of all, Tobias, it's great to be here and talk with you. I'm Dan. I am, currently the chief of innovation in the department of radiology at NYU Grossman School of Medicine. I'm really a physicist in medicine who spent his entire career trying to develop new ways of seeing.
[00:02:52] Tobias Macey:
And do you remember how you first got started working in the ML and AI space?
[00:02:57] Daniel Sodickson:
Yeah. Not, I can tell you, by design at all. So really more by degrees. It kinda gradually wormed its way into my research. I remember beginning my career with linear algorithms that works really well to reconstruct images from the raw data that was coming off of MRI machines. Then we moved to nonlinear algorithms when something called compressed sensing appeared on the scene, which allowed us to sort of pre compress the images that we were gathering rather than spending all this time gathering them and then throwing away much of that hard one information. And then we started moving to so called dictionary learning algorithms, sort of primitive primitive AI that selected the best of a concrete library of possibilities. And then it was around 2014 or 2015 that we started digging into actual deep learning, sort of modern AI. And our group was actually among the first to introduce deep learning for accelerated medical imaging back in around 2016.
[00:03:54] Tobias Macey:
And so imaging is definitely very prevalent in the medical field in the form of things like X rays, MRIs, as you mentioned, CT scans, but it's also something that virtually everyone is familiar with in their daily life by virtue of things like smartphone cameras, digital cameras, Polaroid photos from back in the day. And so I'm just wondering if you can give your sense of the kind of scope and application of imagery in the context of the AI arena and also in other autonomous technology situations.
[00:04:30] Daniel Sodickson:
Yes. Good question. And I think that question raises the fundamental question of what exactly is an image. I tend to think of images as maps of spatially organized information. So like an actual map of the world or a representation of a scene in the in nature or a view of a star field or a slice through a living body. They're all spatially organized information. They all help us to tell here from there. Actually, in the early oceans, the ability to tell here from there was such a powerful competitive advantage that some people say the development of eyes drove the Cambrian explosion, that sort of well known explosion of biodiversity in the Cambrian era. When it comes to AI, images are the domain of what some kind sometimes called computer vision. Right? Basically, ways of processing artificially created and digitally stored images in much the same way that our brains process the visual input from our eyes.
So the images, like you said, can be anything. The view through telescopes or microscopes or cameras or MRI machines, you name it. And then, basically, AI has the task of either generating those images in different ways or taking those images and distilling information from them in various ways.
[00:05:45] Tobias Macey:
And so given the breadth of applications for image understanding and computer vision, but also just given the wide variety of potential input formats, I'm wondering what you see as some of the complexities and challenges facing the development and fine tuning of some of these ML and AI routines for being able to interpret these different styles of image data because we are very familiar with JPEGs and PNGs as people who are taking pictures and viewing pictures online. And there's also, as you pointed out, the spatial aspect of imagery. In in some cases, you might even think of things like LIDAR point scans as a form of imagery where there are radar depictions. There's the synthetic aperture radar that we're using a lot for global mapping technologies now with the various satellite capabilities. And just because of that breadth of representation, how does that either aid or complicate the use of AI for being able to interpret and process those inputs?
[00:06:57] Daniel Sodickson:
Yeah. There are notable differences between different imaging modalities, different imaging types, and there are also remarkable similarities. And I can actually give you a little bit of a story about each to kind of illustrate the complexities and also the connections. So, you know, the content of of so called natural images is very different in many ways than that of same medical images. How a scene is filled, what's normal, what needs to be attended to, and so neural nets, for example, do require careful tuning. You can't just train on natural images and expect perfect performance on a medical imaging task. And it also requires domain expertise because you need to know what is important to look at. A radiologist will basically see right through very prominent features, which he or she happens to know are artifacts that a machine would need to learn are artifactual and unimportant.
So, I had this example of, actually at NYU, we were in a collaboration with, back then, Facebook, now Meta, to try to accelerate MRI scans by a factor of 10. It's a different story how Facebook got interested in a problem like that. But basically, they trained their first neural net in a week to generate fast images. And the images were completely unacceptable to our radiologists. You know, they looked nice and crisp and sharp and the radiologists said they're just wrong. And it took us now several months of sitting down and talking about the physics of MRI together and iterating with the radiologists so that the machine learning experts could kind of figure out what it is the radiologist needed to attend to. And after that, the images got better and better and better until they were indistinguishable from traditional unprocessed images acquired much slower. So that's an example where basically you need that domain expertise and that tuning in order to provide the right answers. And then one other quick, story if I may in the other direction. So one time I was giving a talk at a conference that was bringing astronomers and medical imagers together, and I was supposed to give a talk about introducing medical imaging to astronomers, and my opposite number, doctor Rovashi Rao, was supposed to introduce astronomy to medical imagers. We sat down to look at our slides and coordinate, and we realized that the slides basically said exactly the same thing. We were actually using the same mathematics to generate our images from radio telescopes and MRI machines. It was like discovering a long lost sibling. Like, with with notation changes,
[00:09:29] Tobias Macey:
the equations were identical. It was remarkable. So there are a lot of differences, but but some surprising similarities as well. Yeah. That's something that came up in another conversation I had a little while ago, I think, on this podcast, but I'll I'm blanking on exactly which conversation. But bringing up the fact that there are a lot of commonalities in terms of technique across different problem domains, scientific disciplines. And the the the challenge is that the nomenclature is different because people are using different terms of art to talk about the same things. And once you sit down long enough with somebody working in one of those fields as an expert in another field, you start to understand, oh, we're actually talking about the same thing. And so because of that fact, there is a lot of duplicative work that's happening in those various subfields. And so they were talking about the capability of AI to act as some of that translation layer so that if you, as a specialist in MRI technologies, are searching for useful papers to help direct your research using something like a deep research agent to do some of that exploration can help with some of that automatic translation to say, hey. These two different terms of art are actually talking about the exact same thing. So now maybe I'll start pulling in reference material from astronomy to help you understand how to approach the problems that you're trying to solve at MRI systems.
[00:10:51] Daniel Sodickson:
I completely agree with that notion, and I think it's a fabulous idea and a a really neat use of AI. Certainly, I've spent a more than a quarter of a century as an imager, and I've had to hunt out those connections sort of individually one by one. I I'm guessing what has taken me years could take minutes for, a suitable agent.
[00:11:12] Tobias Macey:
And so now digging a bit more into the current situation of AI applications in this broad category of imaging and vision, what is your understanding of the overall landscape of how AI is being used and maybe some of the ways that it can be used going forward as we continue to iterate on these capabilities and improve the overall subsystems?
[00:11:41] Daniel Sodickson:
I think of it in two broad categories. First, there's downstream AI, by which I mean processing images gathered in the usual way with whatever machines we have to gather them, and then using deep learning to hunt for patterns, to put on labels, to find cancers, to find new stars in the sky, things like that. Basically developing human like or in some cases, superhuman capabilities for recognition. I believe this area is interesting but not by any means the last word. And I think also this notion of kind of replacing humans is an interesting one. You hear this a lot in AI conversations. Right? And in fact back in 02/2016, Jeff Hinton who is now a Nobel Laureate in physics for his his work in in AI basically said at a public conference we should stop training radiologists now because machine learning's gonna do their job better than they can in five years. It's more than five years from 2016 and I actually don't know a single radiologist who's been replaced. Other jobs have arguably been replaced. But I think it's a much more kind of collaborative picture now. And and there's a colleague of mine, Kurt Langlotz, who said a while back that, AI is not gonna replace radiologists. Radiologists who use AI are gonna replace those who don't. So I think that's kind of downstream AI, but there is also a less well known area that I call upstream AI. Basically, that allows us to gather different data and maybe even to redesign the machines we use to gather that data. And I think this has an even more powerful capability dramatically worse data than we usually need, which means faster imaging with existing devices, or it could even mean more accessible devices, an MRI machine I build into a chair that you sit on. But because I can use AI to connect different imaging sessions, if I've seen you before, that sort of slender thread of data is enough to determine if there are changes in you today. So that type of upstream AI, I think, is gonna be truly transformative. It's gonna allow us to image people
[00:13:51] Tobias Macey:
in different settings. It's gonna allow us to give our scanners memory, for example. Backtracking a little bit to your point of the potential for AI to replace areas of work. Speaking as a software engineer, I've definitely been hearing a lot of that myself as well of, oh, we don't need software engineers anymore. We've got AI who could write code. But failing to recognize that the specific action of generating code is not the entirety of a software engineer's job. And, Ethan Mollick in his book, Co Intelligence, did a very good job of highlighting this, that a job is more than just the sum of its tasks. It's the judgment and, interpretation that goes into it as well that makes the human valuable. But the sort of centaur situation of humans with AI where we can run much farther and faster with AI than we can by our own means is also something that is ignored at your own peril. And so I think that aspect of using AI as either a pre filter or a post filter on work to be done in the radiology departments is something that is valuable and useful, particularly because of the fact that, as you point out in your book, no human can acquire as much experience and exposure to the amount of data that is fed into these AI models in their lifetime.
And so there is that difference in terms of magnitudes of scale as far as information processing, but the AIs do not have our interpretation and reasoning capabilities that help us to pull from other disparate domains and experiences that help to identify some of the anomalies and outliers that the AI may never see.
[00:15:27] Daniel Sodickson:
Absolutely agreed. I think, first of all, what you said about coders is very much true about radiologists and many other disciplines. Right? It's, you know, everyone thinks radiologists kind of scan through an image to look for the tumor. Oh, boom. There it is. That's a tiny fraction of what they do and probably not their favorite part of what they do. In fact, that part is done sometimes in a fraction of a second at a glance. A lot of the other judgment and detective work is part of the cognitively interesting tasks that machines have by no means, replaced at this point. So, yeah, I I agree entirely.
[00:15:59] Tobias Macey:
The other interesting aspect of using machines in the context of vision is that for the better part of the last decade, the primary application of machine vision has been in the space of object detection and object recognition and when dealing with video, maybe object tracking. And now as we have entered the generative era, a lot of it too is now being applied to creation of imagery and video. But I'm wondering what you're seeing as some of the other applications of AI to change the ways that we think about vision and spatial sensing beyond just our human facilities and some of the untapped areas of potential that we should be looking to for AI in the context of imagery, whether visual information or otherwise?
[00:16:49] Daniel Sodickson:
Yeah. It's a big question, but a very interesting one. I think in some ways, maybe we're emulating the wrong things. We're emulating tasks. Can we do this task with a machine learning system better than a human? Can we see this thing? Can we track this object? I think where a lot of the interesting work is gonna be in my view is emulating not tasks but sort of the deeper functions of our perception. So our eyes are just the sensors, but actually our brains play an outsized role in vision. One one thing I say in one of the to begin one of the chapters of the book is the world we see is a lie. Actually, when you're looking at me through the screen or in real life, you can tell how far away I am. And we have this nice classic picture that everyone talks about of depth perception. Oh, we've got two eyes. We triangulate.
We can tell depth. Actually, as a tomographer, I can tell you that's nonsense. Two projections of a three d scene is not nearly enough to tell full depth. Actually, what we're doing, I think, is we're bringing to bear all of these concepts we've learned about how the world works. You know I'm not a giant of football field away. You know I'm a human sized person, which immediately narrows the options. I think that type of AI that basically is built on world models, that sort of knows something about how the world works and then brings that into vision to do things like object detection for self driving cars, to do things like, reconstruction of images from tiny bit of under sampled data. I think that's where a lot of this, not even superhuman, but entirely, you know, nonhuman capability can come, but emulating some of these deeper human functions.
[00:18:38] Tobias Macey:
And there's also a substantial body of work that's been happening, particularly as we've been building these generative models focused on visual imagery and the styles of photographic media that have been commonplace. And I'm wondering from your experience how much additional work there's being done in that approximate space to incorporate things such as MRI imaging. Obviously, radiology is a very high profile field, so I'm sure that there's a lot of information from things like MRI and X-ray. But maybe as we move to other more niche areas of imagery, like maybe X-ray crystallography and the, subatomic field space or maybe even some of the information that we're collecting from systems such as CERN where we have spatial data about the interaction of the particles, but it's such a massive volume of data. What are some of the types of information that are maybe not getting as much attention as they deserve?
[00:19:40] Daniel Sodickson:
That's a very interesting question. I think, first of all, we are at the moment leaning incredibly heavily on LLMs. So even though there are image foundation models that people are building, the underlying structure is still sort of similar to the way we're processing volumes of text. And I think that's gonna be a mistake in the long run or at least it's gonna be a a stop on the way. I don't think just scaling up LLMs is gonna get us to where we need to be particularly for visual tasks, for example. So I think one thing that's interesting is people are, yes, starting to incorporate, say, CT scans, MR scans into foundation models. But I think what's gonna be really interesting is to do, again, something more like what the brain does, merging text information with image information with, say, blood test information or other sensor information.
That's what our brains are doing. Right? We're integrating multi sensory input, distilling it down to compact representations, and then farming out those representations for whatever task is required. So I actually think that sort of representation learning type picture is a really interesting direction to pursue in the future.
[00:20:59] Tobias Macey:
The other interesting point that you raised there is that language models have gained our attention because of the fact that language is our natural means of interacting with the world and conceiving of the world, at least for a majority of people. I know that there are some people who don't think in language. But to that point, because we're dealing with visual imagery, there has to be some sort of translation layer where we bridge the divide to language in order for us to be able to issue and interpret instructions to and from these models to understand what is it that it's actually sensing or what is the insight that is to be gained through the use of that model. And I'm wondering, as we move between those different modalities, how we can try and preserve some of the semantic context that gets lost in translation between, you know, human language and the language of imagery.
[00:21:51] Daniel Sodickson:
Yeah. Well, first of all, I would actually say that vision precedes language, both from an evolutionary point of view, because it certainly did, and in some ways, in our interpretation of the world. Language is sort of the conceptual veneer we put on top of it, but the first thing coming into our senses is actually visual. And if you look at the real estate in the brain occupied by visual centers, it certainly gives language a run for its money. So I actually think that there's a lot we can learn about the world and a lot that, say, infants learn about the world before they have language that is entirely visual. I think one of the things getting a little more technical, one of the things that I'm really looking to see is when we do try to merge semantic content between images and language. I'd like to see us move away from things like, say, clip models that basically just try to take the representations, the embeddings coming off of images, and those coming off of language and mush them together, bring them close together. Because I think though that's valuable and you can say, okay. Well, these are both this image and this text are both associated with the concept of a cup. I think you also risk losing a lot of complimentary information because an image of a cup is a lot more than just the concept that the word conveys.
And so what I'd love to see is actually things like, cross modality mass auto encoding. So in other words, you block out some words in the text, and based on the combination of the other words and the imagery, you fill that in. You block out some parts of the image and based on everything else you have, you fill it in. You basically learn the sort of rich correlations between, say, language and vision or or other information. I think that's gonna be a more robust way of dealing with multimodal multimodal data.
[00:23:46] Tobias Macey:
That aspect of masking the information so that the model doesn't depend on implicit detail is definitely one that's very important and one that we've developed a lot in recent iterations of AI research. And I think too that while that is useful for improving the efficacy of the model to do the task that it's assigned, the other major challenge that we have, particularly if we're dealing in fields such as medicine or other high risk situations, is the lack of interpretability or explainability of a lot of these deep learning systems, which, you know, is another active area of research. But I'm curious how you're seeing some of that in terms of the adoption of AI technologies in these regulated fields as well as some of the work that is being done to provide more of that contextual understanding of what decisions a model is making in the process of executing a particular task and particularly auditability of the model as we move from a human directed interaction pattern to more of an agentic and autonomous usage of these models for different problem domains.
[00:25:08] Daniel Sodickson:
Yes. Well, once again, I think I'll break it up into downstream and upstream cases. For downstream AI, where basically you're making an interpretation off an off of an image. Obviously, interpretability, you know, is is key if you wanna trust perhaps a life or death type of decision that's being made. I think more and more I'm seeing people using grad cams, heat maps, you know, to identify areas that a machine learning algorithm is is using to make its decision and we can very quickly at a glance sometimes say, you know what? It's looking at the same things a radiologist would look at. That's a little reassuring. Or it's looking at these different things. Is that interesting? Is there something we can learn from that? So I think there's definitely some work there. In the upstream AI area where we're talking about maybe generating images from a tiny sliver of data, there we have some other advantages. We actually often use what we call physics guided neural nets, Because we actually know the physics of how the data is being gathered by our machines rather than just letting the neural net try to learn that itself.
We actually incorporate that as constraints effectively into the neural net search. And what that does is it guarantees that any solution we get is a physically plausible solution. And right off the bat, that tosses out the majority of possible hallucinations. So I think there are some very interesting strategies. And and if you think about it in a way, using guidance by physics, that's also what we do as humans to check plausibility. Right? We make sure that something, you know, doesn't violate the laws of physics. That's immediately a good check against unreal things. So I think we can use similar strategies in machine learning.
[00:26:55] Tobias Macey:
Now with the introduction of AI and ML technologies into these fields where imagery is the core of the work being done and is so essential to research that's being made or discoveries that are happening or care that is being provided. How does that change the ways that we think of what the role of the human is in those applications and how we can move beyond our current applications and practices and start to become more AI native in the application of these either areas of research or areas of care?
[00:27:41] Daniel Sodickson:
Well, I'll tell you one example in medicine that I think is gonna be enabled by AI and where we're gonna lean heavily on AI to the exclusion of humans, but then hand off to humans. Right now, health care is in many ways reactive. Right? You develop symptoms. You deny them for a while. Eventually, you you you decide, okay. I can't ignore them anymore. You go to the doctor, and then they use the big guns. Then they bring in the MRI machines and the other tests, and you discover, oh, you have this disease. And, unfortunately, sometimes you discover it too late. Why in the world, I've started to wonder, are we using all these amazing types of medical tests including imaging only once we already know you're sick. Could we actually use them earlier in the diagnostic process? And the answer is yes. But if we had humans interpreting those images, we'd be spitting out false positive findings left and right. Because you take an image of you, you're gonna find a few didzels here that are off. You're gonna find something or other. And a human is gonna feel sort of obliged and also medical legally obliged to call it. But if we have AI agents, we can kinda tune them. And also if we have history, in other words, if, you know, maybe we the first time we image you, we ignore a lot of things that might be uncertain because we know we're gonna image you again in six months or in a month or whatever. And then we basically train neural nets to look specifically for changes from your normal baseline. That process would eliminate almost all of the false positives. And all of a sudden, we'd have an early detection engine that could tell us, hey. You know what? You don't have cancer yet, but you're headed there. Better go see the doctor. And then in comes the human. And then the the current reactive medical care system can do what it does best and cure you. So that's one of the ways I think that we could that AI could be really transformative.
[00:29:36] Tobias Macey:
That aspect of more customized and personal care is one that has been put forth as one of the major benefits of AI and the advancement of technology. And one of the reasons that we don't typically have that is because humans are expensive. And so but we're also in a situation right now where AI is also very expensive but not on the same scale. And I'm wondering what you're seeing as maybe opportunities as well to improve the efficiency of these systems so that their cost goes down and so that we as well and also so that the hardware necessary to execute them goes down or can be scaled down so that we can distribute it equitably to everyone and make it so that it is more readily accessible. And, also, just as a side note, I think too, you jokingly made the aside that you're going to ignore your problems for long enough until they become critical, and maybe that too will help reduce some of the reticence to go and seek a health care provider. One, because it's not going to cost as much, but also because you don't have to maybe disrupt your day as much or you don't have to go and confront another human to admit your your own mortal failings.
[00:30:52] Daniel Sodickson:
I I certainly hope that we are heading in that direction where essentially medical monitoring is commonplace. It's what we do. It's like walking you're walking across the street. Your eyes check. Hey. Is there a car coming that's gonna hit me? No. Okay. I go. Likewise, you wake up. You have a medical check. You're fine. You move on. Or you know what? There's something that's a little dangerous. You deal with it. I think I think our whole relationship to medicine is gonna change. But back to your question of accessibility and cost, I actually think we have remarkable opportunities now to bring cost down. First of all, for the type of agents I'm talking about that are basically doing that monitoring, they don't need to copy all of what a radiologist does. They don't need to find every little thing. They actually have a pretty simple job.
Is there a change from your baseline? So actually, I don't think we need need dramatically expensive agents here. I think, you know, you can train something up that can be blindingly fast in inference and doesn't necessarily require a lot of resources, particularly if you've distilled all of your past history into a nice compact set of embeddings. They're easy to store. They're easy to incorporate. All the agent needs to do is say, you're good or, oh, something's up. Second of all, if we have these agents, one of the things we've actually found in some of our work in the lab is that we can make really good diagnoses or really good predictions of risk off of really lousy data as long as we have priors. If we have enough priors, basically, all we need to do is tell change. So that means the machines themselves are gonna be way less expensive. We don't need to build a multimillion dollar MRI machine in a big tube.
Maybe we can build an MRI machine in a seat that's good enough to check your prostate, and it can be in a CVS. It can be in your home. So I think we really are an inflection point of cost if we play our cards right.
[00:32:56] Tobias Macey:
Beyond the medical field, what are some of the other changes that you're seeing in the science or research community about how imagery and vision is used as an input where maybe the cost of having a human do the analysis and annotation was prohibitive, but now it's dropped to the point where if you can get an AI to give you a good enough first pass, you can actually incorporate more of that spatial and visual data into areas of research that it was typically ignored?
[00:33:32] Daniel Sodickson:
Oh, interesting. I mean, the first place my mind goes when you ask that question is something like remote sensing. You know, satellite imagery, for example, where to have a human be scanning every little bit of video would just be mind numbing. But if you have an agent on the lookout all the time, you can flag whenever something interesting happens, and then that triggers all of this research. Another area that comes to mind is sky surveys and astronomy. Right? You're looking for these rare events, and when they happen, then you can hone in and maybe even train advanced devices onto those events so you can pick up the interesting supernovas. You can pick up the, you know, collision of black holes.
So I think in some ways and and and and that applies likewise in medicine. I mean, what physician wants to spend their time just hunting through billions of pixels for something that's interesting and important? Much rather I've I've actually had a a radiologist, my my chair of radiology, Michael Recht, tell me, I wish the images coming to me could just be labeled normal or abnormal. I'll do the rest. That's what I love to do. But if it's normal, I don't wanna waste a lot of time scanning through everything. So I think that applies across a wide range of of image types.
[00:34:54] Tobias Macey:
And as you have been working in this field of imagery for so long and navigating this transition to computer vision and autonomous systems being able to play a larger role in that activity, what are some of the most interesting or innovative or unexpected ways that you have seen vision applied within the context of AI and your overall experience and exposure to the broad space of imagery?
[00:35:21] Daniel Sodickson:
Let me think. There's so many different ways that images are ingested and used trying to settle on one. Well, so here's something that's a little on the sci fi side. I mean, I've already talked a bunch about how imaging and AI are changing health care and how AI is gonna be processing some of the huge volume of images that, you know, are now created in the world. There are some people who are thinking a little bit further out even and thinking, well, okay. Can I use images to tell what you're thinking? And we have this thing called functional MRI, which tracks your brain activity and, you know, you can make an image light up basically where there's thinking activity, if you will.
And some people are starting to wonder, can I use AI to interpret that raw data and identify what thought you're having? That sounds utterly terrifying in many ways, obviously, But, also, there are other possibilities, like, could we share each other's perspectives in more than just a simple visual way? Could we develop a kind of, you know, species wide empathy? It sounds like it's super far away, but actually and it probably is in that big Broadway. But there are actually people who have, for example, processed functional MRI data from somebody who's looking at an image and then tried to guess what image they're looking at. And the neural nets get pretty close, like, based on the functional information flowing into in your brain from what's flowing into your eyes, they can get, like a posterized version of what it is somebody's looking at. So food for thought.
[00:37:20] Tobias Macey:
I think that that's also very interesting in the context of psychological disorders of being able to gain a better understanding and interpretation of the brain activity that's happening with people who are undergoing treatment or being able to understand at a more detailed level some of the characteristics of neurodivergent symptoms.
[00:37:44] Daniel Sodickson:
A hundred percent. And already, people are using AI, I'm sure you know, to identify, you know, early onset schizophrenia or to find from just raw video signs of psychiatric disease that then can be treated. I think in some ways, we are using artificial eyes now to substitute for the expensive and limited eyes of trained clinicians, and I think it's gonna help a lot of people down the line.
[00:38:14] Tobias Macey:
That's also another aspect that's interesting is that a majority of the imaging technology that we have is designed with the human as the end consumer where we are biasing towards maybe translating nonvisual inputs into some form of visual representation either by converting mathematical formulas into a a pictographic representation or changing the color distributions or reducing dim dimensionality so that it's interpretable by a human. And as we bring these computer systems more into the workflow, what are some of the ways that maybe we change the ways we think about the capture of that information because it's never going to be interpreted by a human?
[00:38:59] Daniel Sodickson:
So I'm actually working on a paper with a colleague, Sumit Chopra, right now with the title, Imaging Without Images. In fact, if we have machine learning systems on the other end, you're absolutely right. We don't necessarily need to spit out pristine images. Images are just a really good interface to our optical pathways and the associated brain pathways. But if we've got other types of algorithms, sure. An imaging device can really just present a signature of some sort of fingerprint of an interesting object. And that's all that a machine learning system would need to detect it, which again means that a lot of the, I don't know, bells and whistles that go on to a modern imaging device to make its image nice and crisp and sharp, we can dispense with. Make them cheaper or just really focus on getting really good fingerprints of objects of interest. I think that's gonna be a really interesting area of, again, what I call upstream AI.
[00:39:57] Tobias Macey:
It also changes the ways that we think about the trade offs of converting to those images because many of the image formats are in some way lossy where we are throwing away some of the detail of the initial capture just in order to be able to make it interpretable by the human who is viewing it.
[00:40:16] Daniel Sodickson:
Absolutely. In radiology, actually, we do that all the time. We might take input from 20 different MRI detectors that actually is complex, real, and imaginary, and we distill all of that down into a single real image that is then presented to the radiologist. There's a fair bit of information that we are tossing away there in order to to make that translation. So, yeah, I think at some point remember, images are just spatially resolved information. If we have a signal that has spatial information in it, that's gonna be enough in many cases.
[00:40:53] Tobias Macey:
And as you have been working in this field of imagery and working as the overall applications and sophistication of ML and AI capabilities have grown, what are some of the most interesting or unexpected or challenging lessons that you've learned personally?
[00:41:12] Daniel Sodickson:
One is certainly the importance of not just sort of plug and play, shove some input into a neural net and, you know, expect that it's gonna get the answer you want. That's been disproven many times in my practical experience. I think one of the lessons that I've learned both in my research and in preparing and writing the book is that there are lots of lessons we can learn from our deep biology. Things that the brain has evolved over time, tricks basically, to efficiently distill information down to useful representations that we can really follow. And, by the way, along those lines, I'm a big fan of a 2022 article, I think, by Jan Lukun called the path towards autonomous machine intelligence, which lays out this architecture for developing kind of a world model engine that takes all of these distilled representations from different tasks and then farms them out to other tasks and shares them around just like one might imagine the human brain does.
I think that's, at the moment, a challenge for us to step back from given tasks and start thinking about building up models of the world. But I also think it's one of the greatest opportunities that's gonna give us remarkable new capabilities.
[00:42:32] Tobias Macey:
And given all of the potential applications and all of the potential value to be realized through the application of AI in the vision space, what are the situations where you advocate against it and instead defer back to human interpretation and human expertise? Again, I think these sort of plug and play models, like, I wanna find
[00:42:54] Daniel Sodickson:
a pulmonary nodule. So I've trained a neural net to find a pulmonary nodule on a CT scan. I think there's a real risk if we start relying just on those because we're gonna miss the other important incidental findings that a cross trained human radiologist might find. I think really the places that AI has adds the best value now, it just in sort of clinical interpretation of images, is in making sure something is seen. Once again, in kind of raising a flag, hey, this is abnormal. You should really look at it, rather than just training it on an image and assuming that we're done. Again, I think we we got to avoid the plug and play. We gotta preserve judgment because a lot of medicine is about judgment. And if we can enhance our judgment, then we're cooking with gas.
[00:43:47] Tobias Macey:
What are your hopes and predictions as we barrel into the future and continue to invest in and explore the capabilities of AI in the context of imagery and particularly as they move beyond interpretation and into understanding?
[00:44:06] Daniel Sodickson:
I think one way of looking at it is that the way we see is actually changing, and I think we need to keep our eyes open for that because it can be subtle. But if you look back at the history of imaging, every time we've discovered a new imaging modality, a new probe that brings us spatial information, inevitably, it has brought societal change, changes to science, changes to the way we understand ourselves in the world. Now we've got cameras everywhere. We've got AI systems that are processing, digesting those images for us. We're in a situation where you could argue our vision is capable of being hacked. Somebody can start showing us only what we want to see or only what they want us to see. And I think that's a world where we need to be very careful that we preserve a notion of truth. So, basically, my my bottom line message is keep your eyes open because the way we see is changing.
[00:45:04] Tobias Macey:
Yeah. There's an interesting anecdote from, I think, early on when I started working on this podcast where somebody was discussing some of that aspect of the potential for AI to mass produce imagery and information. And in they were relating a conversation they had where somebody said, oh, well, we should watermark all of the AI generated content so we know that it was artificial. And the response was, no. We should actually watermark all the stuff that humans actually had to play in because that's going to be the minority going forward.
[00:45:36] Daniel Sodickson:
What what a fascinating and terrifying thought, isn't it? Yeah. Absolutely. Yeah. I think we are fundamentally visual creatures. We are all creatures of imaging. And in a world where, just like in the early oceans, interestingly enough, there are eyes everywhere. I think we need to give some thought to how we want to ingest that visual information. We wanna make sure that it keeps being a a survival benefit rather than the opposite.
[00:46:08] Tobias Macey:
Are there any other aspects of your work in imaging, the applications of AI in that space, or, any of the other topics that we touched on that we didn't discuss yet that you'd like to cover before we close out the show?
[00:46:20] Daniel Sodickson:
Let's see. I think we actually covered a lot. This was a great conversation. So no. I think I I think we've got the basics.
[00:46:28] Tobias Macey:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being some of the biggest gaps in tooling technology or human training that's available for AI systems
[00:46:44] Daniel Sodickson:
today? I think gosh. There's so much tooling that has been built. I guess, I'm gonna go back to one of the things you mentioned earlier in the conversation. The idea of an agent that can cross disciplines and find connections, that's such a powerful capability. I think we've got all of these different algorithms. We've got all of these archive papers touting the latest. I'd love something that could distill what are the five key methodologies? What are the 10 key concepts that underlie all of these different network topologies and all of these different training approaches so that we have a sort of simpler building block set to work from? I'm a physicist, so I like fundamental concepts.
I'd love to see that as an aid to really understanding what we're doing as we build machine learning systems.
[00:47:49] Tobias Macey:
Absolutely. That sounds like a very interesting project for someone to endeavor for. So, well, thank you very much for taking the time today to join me and share all of your experience and insight into the world of imagery and some of the ways that AI is playing a part in that. So appreciate all of the time and effort that you're putting into helping to drive the industry forward in that regard, and I hope you enjoy the rest of your day. Thank you so much. I've really enjoyed this conversation with you. Thank Thank you for listening. Don't forget to check out our other shows. The data engineering podcast covers the latest on modern data management, and podcast dot in it covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts@AIengineeringpodcast.com with your story.
Guest intro and career path into AI imaging
From linear reconstructions to deep learning in MRI
What is an image? Maps of spatially organized information
Modalities differ, but physics rhymes across domains
Downstream vs. upstream AI in imaging
Human judgment, centaurs, and collaboration with AI
Beyond tasks: world models and perception in vision
Multimodal foundations and representation learning
Bridging language and vision without losing semantics
Interpretability: heat maps and physics-guided nets
From reactive care to proactive monitoring with AI
Cost, accessibility, and lightweight monitoring agents
Scaling scientists: remote sensing and sky surveys
Sci‑fi adjacent: reading thoughts with fMRI signals
Imaging without images: machines as the end consumer
Lessons learned and world‑model architectures
Where not to use AI: avoid plug‑and‑play diagnostics
Our vision can be hacked: truth in an AI‑saturated world
Closing thoughts and gaps in tools, tech, and training
 
                 
		 
		 
		 
		 
		 
		 
				 
				 
				 
				 
                                 
                                 
                                        