From MRI to World Models: How AI Is Changing What We See

Hello, and welcome to the AI Engineering podcast,

your guide to the fast moving world of building scalable and maintainable

AI systems.

When ML teams try to run complex workflows through traditional orchestration tools, they hit walls.

Cash App discovered this with their fraud detection models. They needed flexible compute, isolated environments, and seamless data exchange between workflows, but their existing tools couldn't deliver.

That's why Cash App relies on Prefect.

Now their ML workflows run on whatever infrastructure each model needs across Google Cloud, AWS, and Databricks.

Custom packages stay isolated.

Model outputs flow seamlessly between workflows.

Companies like Whoop and 1Password also trust Prefect for their critical workflows, but Prefect didn't stop there. They just launched FastMCP,

production ready infrastructure for AI tools.

You get Prefect's orchestration plus instant OAuth, serverless scaling, and blazing fast Python execution.

Deploy your AI tools once. Connect to Claude, Cursor, or any MCP client. No more building off flows or managing servers.

Prefect orchestrates your ML pipeline. FastMCP handles your AI tool infrastructure.

See what Prefect and FastMCP

can do for your AI workflows at aiengineeringpodcast.com/prefect

today.

Unlock the full potential of your AI workloads with a seamless and composable data infrastructure.

Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most, building intelligent systems.

Write Python code for your business logic and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement.

With native support for ML and AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions.

Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch.

Build end to end AI workflows that integrate seamlessly with your existing tech stack.

Join the ranks of forward thinking organizations that are revolutionizing their data engineering with Bruin.

Get started today at aiengineeringpodcast.com/bruin.

And for DBT cloud customers, enjoy a $1,000

credit to migrate to Bruin Cloud. Your host is Tobias Macey, and today I'm interviewing Daniel Sodickson about the impact and applications of AI that is capable of image understanding. So, Daniel, can you start by introducing yourself?

Absolutely. Yeah. First of all, Tobias, it's great to be here and talk with you. I'm Dan. I am, currently the chief of innovation in the department of radiology at NYU Grossman School of Medicine.

I'm really a physicist

in medicine who spent his entire career trying to develop new ways of seeing.

And do you remember how you first got started working in the ML and AI space?

Yeah. Not, I can tell you, by design at all. So really more by degrees. It kinda gradually wormed its way into my research. I remember beginning my career with linear algorithms

that works really well to reconstruct images from the raw data that was coming off of MRI machines.

Then we moved to nonlinear algorithms when something called compressed sensing appeared on the scene, which

allowed us to sort of pre compress the images that we were gathering rather than spending all this time gathering them and then throwing away much of that hard one information. And then we started moving to so called dictionary learning algorithms, sort of primitive primitive AI that selected the best of a concrete library of possibilities. And then it was around 2014

or 2015

that we started digging into actual deep learning, sort of modern AI. And our group was actually among the first to introduce deep learning for accelerated medical imaging

back in around 2016.

And so

imaging is definitely very prevalent in the medical field in the form of things like X rays, MRIs, as you mentioned, CT scans, but it's also something that virtually everyone is familiar with in their daily life by virtue of things like smartphone cameras, digital cameras, Polaroid photos from back in the day. And so I'm just wondering if you can give

your sense of the kind of scope and

application of imagery in the context of the AI arena

and also in other autonomous

technology situations.

Yes. Good question. And I think that question raises

the fundamental question of what exactly is an image. I tend to think of images as maps of spatially organized information.

So like an actual map of the world or a representation of a scene in the in nature

or a view of a star field or a slice through a living body. They're all spatially organized information. They all help us to tell here from there.

Actually, in the early oceans, the ability to tell here from there was such a powerful competitive advantage that some people say the development of eyes drove the Cambrian explosion,

that sort of well known explosion of biodiversity

in the Cambrian era. When it comes to

AI, images are the domain of what some kind sometimes called computer vision. Right? Basically, ways of processing

artificially created and digitally stored images

in much the same way that our brains process

the visual input from our eyes.

So the images, like you said, can be anything. The view through telescopes or microscopes or cameras or MRI machines, you name it. And then, basically, AI has the task of

either

generating those images in different ways or taking those images and distilling information from them in various ways.

And so given the breadth

of

applications

for image understanding

and computer vision, but also just given the wide variety

of potential input formats,

I'm wondering what you see as some of the

complexities and challenges facing

the development

and

fine tuning of some of these ML and AI routines for being able to

interpret these different styles of image data because we are very familiar with JPEGs and PNGs as people who are taking pictures and viewing pictures online.

And

there's also, as you pointed out,

the spatial aspect of imagery. In in some cases, you might even think of things like LIDAR point scans as a form of imagery where there are radar depictions.

There's the synthetic aperture radar that we're using a lot for global mapping technologies now with the various satellite capabilities. And just because of that breadth of

representation,

how does that either aid or complicate

the use of AI for being able to interpret and process those inputs?

Yeah. There are notable differences

between

different imaging modalities,

different imaging types, and there are also remarkable similarities. And I can actually give you a little bit of a story about each to kind of illustrate the complexities

and also the connections.

So, you know, the content of of so called natural images

is very different in many ways than that of same medical images. How a scene is filled, what's normal, what needs to be attended to, and so neural nets, for example, do require careful tuning. You can't just train on natural images and expect perfect performance on a medical imaging task. And it also requires domain expertise because you need to know what is important to look at. A radiologist

will basically see right through

very prominent features, which he or she happens to know are artifacts that a machine would need to learn

are artifactual and unimportant.

So,

I had this example of, actually at NYU, we were in a collaboration with, back then, Facebook, now Meta, to try to accelerate MRI

scans by a factor of 10. It's a different story how Facebook got interested in a problem like that. But basically,

they trained their first neural net in a week to generate fast images. And the images were completely unacceptable to our radiologists. You know, they looked nice and crisp and sharp and the radiologists said they're just wrong. And it took us now several months of sitting down and talking about the physics of MRI together

and iterating with the radiologists

so that the machine learning experts could kind of figure out what it is the radiologist needed to attend to. And after that, the images got better and better and better until they were indistinguishable

from traditional

unprocessed images acquired much slower. So that's an example where basically

you need that domain expertise and that tuning in order to provide the right answers. And then one other quick, story if I may in the other direction. So one time I was giving a talk at a conference that was bringing astronomers and medical imagers together, and I was supposed to give a talk about introducing medical imaging to astronomers,

and my opposite number,

doctor Rovashi Rao, was supposed to introduce astronomy to medical imagers. We sat down to look at our slides and coordinate, and we realized that the slides basically said exactly the same thing. We were actually using the same mathematics

to generate our images from radio telescopes and MRI machines. It was like discovering a long lost sibling. Like, with with notation changes,

the equations were identical. It was remarkable. So there are a lot of differences, but but some surprising similarities as well. Yeah. That's something that came up in another conversation I had a little while ago, I think, on this podcast, but I'll I'm blanking on exactly which conversation.

But bringing up the fact that there are a lot of commonalities

in terms of technique

across different problem domains,

scientific disciplines.

And the the the challenge is that the nomenclature is different because people are using different terms of art to talk about the same things.

And once you sit down long enough with somebody working in one of those fields as an expert in another field, you start to understand, oh, we're actually talking about the same thing. And so because of that fact, there is a lot of duplicative work that's happening in those various subfields. And so they were talking about the capability of

AI to act as some of that translation layer so that if you, as a specialist in MRI technologies,

are searching for useful papers to help direct your research

using something like a deep research agent to do some of that exploration can help with some of that automatic translation to say, hey. These two different terms of art are actually talking about the exact same thing. So now maybe I'll start pulling in reference material from astronomy

to help you understand how to approach the problems that you're trying to solve at MRI systems.

I completely

agree with that notion, and I think it's a fabulous idea and a a really neat use of AI. Certainly, I've spent a more than a quarter of a century as an imager, and I've had to hunt out those connections

sort of individually

one by one. I I'm guessing what has taken me years could take minutes for, a suitable agent.

And so now digging a bit more into the current situation

of

AI applications

in this broad category of imaging and vision,

what is your understanding of the overall landscape of how AI is being used and maybe some of the ways that it can be used going forward as we continue to iterate on these capabilities and improve the overall subsystems?

I think of it in two broad categories.

First, there's downstream

AI, by which I mean processing images gathered in the usual way with whatever machines we have to gather them, and then using deep learning to hunt for patterns, to put on labels,

to find cancers,

to find new stars in the sky, things like that. Basically developing

human like or in some cases,

superhuman capabilities

for recognition. I believe this area

is

interesting

but not by any means the last word. And I think also

this notion of kind of replacing humans

is an interesting one. You hear this a lot in AI conversations. Right? And in fact back in 02/2016,

Jeff Hinton who is now a Nobel Laureate in physics for his his work in in AI basically said at a public conference we should stop training radiologists now because machine learning's gonna do their job better than they can in five years. It's more than five years from 2016 and I actually don't know a single radiologist who's been replaced. Other jobs have

arguably been replaced. But I think it's a much more kind of collaborative picture now. And and there's a colleague of mine, Kurt Langlotz, who said a while back that, AI is not gonna replace radiologists. Radiologists who use AI are gonna replace those who don't. So I think that's kind of

downstream AI, but there is also a less well known area that I call upstream AI. Basically,

that allows us to gather

different data and maybe even to redesign the machines we use to gather that data. And I think this has an even more powerful capability

dramatically

worse data than we usually need, which means faster imaging with existing devices, or it could even mean

more accessible devices, an MRI machine I build into a chair that you sit on. But because I can use AI to connect different imaging sessions, if I've seen you before, that sort of slender thread of data is enough to determine if there are changes in you today. So that type of upstream AI, I think, is gonna be truly transformative. It's gonna allow us to image people

in different settings. It's gonna allow us to give our scanners memory, for example. Backtracking a little bit to your point of the potential for AI to replace areas of work. Speaking as a software engineer, I've definitely been hearing a lot of that myself as well of, oh, we don't need software engineers anymore. We've got AI who could write code. But failing to recognize that the specific action of generating code is not the entirety of a software engineer's job. And, Ethan Mollick in his book, Co Intelligence,

did a very good job of highlighting this, that a job is more than just the sum of its tasks. It's the judgment

and,

interpretation

that goes into it as well that makes the human valuable. But the sort of centaur situation

of humans with AI where we can run much farther and faster with AI than we can by our own means is also something that is ignored at your own peril. And so I think that aspect of using

AI as either a pre filter or a post filter on work to be done in the radiology departments is something that is valuable and useful, particularly because of the fact that, as you point out in your book, no human can acquire as much experience and exposure to the amount of data that is fed into these AI models in their lifetime.

And so there is that difference in terms of magnitudes of scale as far as information processing,

but the AIs do not have our interpretation and reasoning capabilities

that help us to pull from other disparate domains

and experiences that help to identify some of the anomalies and outliers that the AI may never see.

Absolutely agreed. I think, first of all, what you said about coders is very much true about radiologists and many other disciplines. Right? It's, you know, everyone thinks radiologists kind of scan through an image to look for the tumor. Oh, boom. There it is. That's a tiny fraction of what they do and probably not their favorite part of what they do. In fact, that part is done sometimes in a fraction of a second at a glance. A lot of the other judgment and detective work is part of the cognitively interesting

tasks that machines have by no means, replaced at this point. So, yeah, I I agree entirely.

The other interesting aspect of

using machines in the context of vision is that

for the better part of the last decade, the primary application of machine vision has been in the space of object detection and object recognition and when dealing with video, maybe object tracking.

And now as we have entered the generative era, a lot of it too is now being applied to creation

of imagery and video. But I'm wondering what you're seeing as some of the other applications

of AI to change the ways that we think about vision and spatial sensing beyond just our human facilities and some of the untapped areas of potential that we should be looking to for

AI in the context of imagery, whether visual information or otherwise?

Yeah. It's a big question, but a very interesting one. I think in some ways, maybe we're emulating the wrong things. We're emulating tasks. Can we do this task with a machine learning system better than a human? Can we see this thing? Can we track this object? I think where a lot of the interesting work is gonna be in my view is emulating not tasks but sort of the deeper functions

of our perception.

So our eyes are just the sensors, but actually our brains

play an outsized

role in vision. One one thing I say in one of the to begin one of the chapters of the book is the world we see is a lie. Actually, when you're looking at me through the screen or in real life, you can tell

how far away I am. And we have this nice classic picture that everyone talks about of depth perception. Oh, we've got two eyes. We triangulate.

We can tell depth. Actually, as a tomographer, I can tell you that's nonsense. Two projections

of a three d scene is not nearly enough to tell full depth. Actually, what we're doing, I think, is we're bringing to bear all of these concepts we've learned about how the world works. You know I'm not a giant of football field away. You know I'm a human sized person, which immediately

narrows the options. I think that type of AI that basically is built on world models,

that sort of

knows something about how the world works

and then brings that into vision to do things like object detection for self driving cars, to do things

like, reconstruction

of images from tiny bit of under sampled data. I think that's where a lot of this, not even superhuman,

but entirely,

you know, nonhuman capability

can come, but emulating

some of these deeper human functions.

And there's also a substantial

body of work that's been happening,

particularly as we've been building these generative models focused on visual imagery and

the styles of photographic media that have been commonplace.

And I'm wondering from your experience

how much additional work there's being done in that approximate space to incorporate things such as

MRI imaging.

Obviously, radiology is a very high profile field, so I'm sure that there's a lot of information from things like MRI and X-ray. But maybe as we move to other more niche areas of imagery, like maybe X-ray crystallography

and the, subatomic field space or maybe even some of the information that we're collecting from systems such as CERN where we have spatial data about the interaction of the particles,

but it's such a massive volume of data. What are some of the types of information that are maybe not getting as much attention as they deserve?

That's a very interesting question. I think, first of all, we are at the moment leaning incredibly

heavily on LLMs.

So even though there are image foundation models that people are building, the underlying structure

is still

sort of similar to the way we're processing volumes of text. And I think that's gonna be a mistake in the long run or at least it's gonna be a a stop on the way. I don't think just scaling up LLMs

is gonna get us to where we need to be particularly for visual tasks, for example. So I think one thing that's interesting is people are, yes, starting to incorporate,

say, CT scans,

MR scans into foundation models. But I think what's gonna be really interesting is to do, again, something more like what the brain does,

merging

text information

with image information with, say, blood test information

or other sensor information.

That's what our brains are doing. Right? We're integrating multi sensory input,

distilling it down to

compact representations,

and then farming out those representations

for whatever task is required. So I actually think that sort of representation

learning type picture is a really interesting direction

to pursue in the future.

The other interesting point that you raised there is that language models have gained our attention because of the fact that language is our natural means of interacting with the world and conceiving of the world, at least for a majority of people. I know that there are some people who don't think in language.

But to that point, because we're dealing with visual imagery, there has to be some sort of translation layer where we bridge the divide to language in order for us to be able to issue and interpret instructions to and from these models to understand

what is it that it's actually

sensing

or what is the insight that is to be gained through the use of that model. And I'm wondering,

as we move between those different modalities,

how we can try and preserve some of the

semantic context that gets lost in translation

between, you know, human language and the language of imagery.

Yeah. Well, first of all, I would actually say that vision precedes language, both from an evolutionary point of view, because it certainly did, and in some ways,

in our interpretation of the world. Language is sort of the conceptual veneer we put on top of it, but the first thing coming into our senses is actually visual. And if you look at the real estate in the brain occupied by visual centers, it certainly

gives language a run for its money. So I actually think that there's a lot we can learn about the world and a lot that, say, infants learn about the world before they have language that is entirely visual. I think one of the things getting a little more technical, one of the things that I'm really looking to see is when we do try to merge semantic content between images

and language. I'd like to see us move away from things like, say, clip models that basically just try to take the representations,

the embeddings coming off of images, and those coming off of language and mush them together, bring them close together. Because I think though that's valuable and you can say, okay. Well, these are both this image and this text are both associated with the concept of a cup. I think you also risk losing a lot of complimentary information because an image of a cup is a lot more than just the concept that the word conveys.

And so what I'd love to see is actually

things like,

cross modality mass auto encoding.

So in other words, you block out some words in the text, and based on the combination of the other words and the imagery, you fill that in. You block out some parts of the image and based on everything else you have, you fill it in. You basically learn the sort of rich correlations

between,

say, language and vision or or other information. I think that's gonna be a more robust way of dealing with multimodal multimodal data.

That aspect of masking the information so that the model doesn't depend on implicit detail is definitely one that's very important and one that we've developed a lot in recent iterations of AI research. And I think too that while that is useful for

improving the efficacy

of the model to do the task that it's assigned,

the other major challenge that we have, particularly if we're dealing in fields such as medicine

or other

high risk

situations,

is the lack of interpretability or explainability

of a lot of these deep learning systems, which, you know, is another active area of research. But I'm curious how you're seeing some of that in terms of the

adoption of AI technologies

in these regulated fields as well as some of the work that

is being done to provide more of that contextual understanding

of what decisions a model is making in the process of executing a particular task

and particularly auditability

of the model as we move from a

human directed

interaction pattern to more of an agentic and autonomous

usage of these models for different problem domains.

Yes. Well, once again, I think I'll break it up into downstream and upstream cases. For downstream AI, where basically you're making an interpretation off an off of an image. Obviously,

interpretability,

you know, is is key if you wanna trust

perhaps a life or death type of decision that's being made. I think more and more I'm seeing people using grad cams, heat maps, you know, to identify

areas that a machine learning algorithm is is using to make its decision

and we can very quickly at a glance sometimes say, you know what? It's looking at the same things a radiologist would look at. That's a little reassuring. Or it's looking at these different things. Is that interesting? Is there something we can learn from that? So I think there's definitely some work there. In the upstream AI area where we're talking about maybe generating images from a tiny

sliver of data,

there we have some other advantages. We actually often use what we call physics guided neural nets, Because we actually know

the physics of how the data is being gathered by our machines

rather than just letting the neural net try to learn that itself.

We actually

incorporate that as constraints

effectively

into the neural net search. And what that does is it guarantees that any solution

we get is a physically plausible solution. And right off the bat, that tosses out the majority of possible hallucinations.

So I think there are some very interesting strategies.

And and if you think about it in a way, using guidance by physics, that's also what we do as humans

to check plausibility.

Right? We make sure that something, you know, doesn't violate the laws of physics. That's immediately

a good check against

unreal things. So I think we can use similar strategies in machine learning.

Now with the introduction

of AI and ML technologies

into these

fields

where imagery

is

the core of the work being done

and is so essential to

research that's being made or discoveries that are happening or care that is being provided.

How does that change the ways that we think

of what the role of the human is in those applications

and how we can move beyond our current applications and practices and start to become more AI native in the application

of these

either areas of research or areas of care?

Well, I'll tell you one example

in medicine that I think is gonna be enabled by AI and where we're gonna lean heavily on AI to the exclusion of humans, but then hand off to humans. Right now,

health care is

in many ways reactive.

Right? You develop symptoms. You deny them for a while. Eventually, you you you decide, okay. I can't ignore them anymore. You go to the doctor, and then they use the big guns. Then they bring in the MRI machines and the other tests,

and you discover, oh, you have this disease. And, unfortunately, sometimes you discover it too late. Why in the world, I've started to wonder, are we using all these amazing types of medical tests including imaging

only once we already know you're sick. Could we actually use them earlier in the diagnostic process? And the answer is yes. But if we had humans interpreting those images, we'd be spitting out false positive findings left and right. Because you take an image of you, you're gonna find a few didzels here that are off. You're gonna find something or other. And a human is gonna feel sort of obliged

and also medical legally obliged to call it. But if we have AI agents, we can kinda tune them. And also if we have history, in other words, if, you know, maybe we the first time we image you, we ignore a lot of things that might be uncertain

because we know we're gonna image you again in six months or in a month or whatever. And then we basically train neural nets to look specifically for changes from your normal baseline. That process would eliminate

almost all of the false positives. And all of a sudden, we'd have an early detection engine that could tell us, hey. You know what? You don't have cancer yet, but you're headed there. Better go see the doctor. And then in comes the human. And then the the current reactive medical care system can do what it does best and cure you. So that's one of the ways I think that we could that AI could be really transformative.

That aspect of

more

customized and personal care is one that has been put forth as one of the major

benefits

of AI and the advancement of technology.

And

one of the

reasons that

we don't typically have that is because humans are expensive.

And so

but we're also in a situation right now where AI is also very expensive but not on the same scale.

And I'm wondering what you're seeing as maybe opportunities as well to improve the efficiency

of these systems so that their cost goes down and so that we as well and also so that the hardware necessary to execute them goes down or can be scaled down so that we can distribute it equitably to everyone and

make it so that it is more readily accessible. And, also, just as a side note, I think too, you jokingly made the aside that you're going to ignore your problems for long enough until they become critical, and maybe that too will help reduce some of the reticence to go and seek a health care provider. One, because it's not going to cost as much, but also because you don't have to maybe disrupt your day as much or you don't have to go and confront another human to admit your your own mortal failings.

I I certainly hope that we are heading in that direction where essentially medical monitoring

is commonplace. It's what we do. It's like walking

you're walking across the street. Your eyes check. Hey. Is there a car coming that's gonna hit me?

No. Okay. I go. Likewise,

you wake up. You have a medical check. You're fine. You move on. Or you know what? There's something that's a little dangerous. You deal with it. I think I think our whole relationship to medicine is gonna change. But back to your question of

accessibility and cost, I actually think we have remarkable opportunities now to bring cost down. First of all, for the type of agents I'm talking about that are basically doing that monitoring,

they don't need to

copy all of what a radiologist does. They don't need to find every little thing. They actually have a pretty simple job.

Is there a change from your baseline? So actually,

I don't think

we need need dramatically

expensive

agents here. I think, you know, you can train something up that can be blindingly fast in inference and doesn't necessarily require a lot of resources, particularly if you've distilled all of your past history

into a nice compact set of embeddings.

They're easy to store. They're easy to incorporate.

All the agent needs to do is say, you're good or, oh, something's up. Second of all, if we have these agents,

one of the things we've actually found in some of our work in the lab

is that we can make really good diagnoses or really good predictions of risk off of really lousy data as long as we have priors. If we have enough priors, basically, all we need to do is tell change. So that means the machines themselves are gonna be way less expensive. We don't need to build a multimillion dollar MRI machine in a big tube.

Maybe we can build an MRI machine in a seat that's good enough to check your prostate, and

it can be in a CVS. It can be in your home. So I think we really are an inflection point of cost if we play our cards right.

Beyond the

medical field, what are some of the other

changes that you're seeing in the science or research community about how imagery and vision is used as an input where maybe the

cost of having a human do the analysis and annotation was prohibitive,

but now it's dropped to the point where if you can get an AI to give you a good enough first pass, you can actually incorporate more of that spatial and visual data

into

areas of research that it was typically ignored?

Oh, interesting.

I mean, the first place

my mind goes when you ask that question is something like remote sensing. You know, satellite imagery, for example,

where to have a human be scanning

every little bit of video

would just be mind numbing. But if you have an agent on the lookout all the time, you can flag whenever something interesting happens,

and then that triggers all of this research. Another area that comes to mind is sky surveys and astronomy. Right? You're looking for these rare events,

and when they happen, then you can hone in and maybe even train

advanced

devices

onto those

events so you can pick up the interesting supernovas. You can pick up the, you know, collision of black holes.

So I think in some ways and and and and that applies likewise in medicine. I mean, what physician wants to spend their time

just hunting

through billions of pixels

for something that's interesting and important?

Much rather I've I've actually had a a radiologist, my my chair of radiology, Michael Recht, tell me, I wish the images coming to me could just be labeled normal or abnormal. I'll do the rest. That's what I love to do. But if it's normal, I don't wanna waste a lot of time scanning through everything. So I think that applies across a wide range of of image types.

And as you have been working in this field of imagery for so long and

navigating this transition

to

computer vision and autonomous systems being able to play a larger role in that activity, what are some of the most interesting or innovative or unexpected ways that you have seen vision applied within the context of AI and your overall experience and exposure to the broad space of imagery?

Let me think. There's so many different ways that

images are

ingested

and used

trying to settle on one.

Well,

so

here's something that's a little on the sci fi side.

I mean, I've already talked a bunch about

how imaging and AI are changing health care and

how

AI

is gonna be processing some of the huge volume of images that,

you know, are now created in the world. There are some people who are thinking a little bit

further out even and thinking, well, okay.

Can I use images

to tell what you're thinking?

And we have this thing called functional MRI,

which

tracks your brain activity and, you know, you can make an image light up basically where there's

thinking activity, if you will.

And some people are starting to wonder,

can I use AI

to interpret that raw data

and identify what thought you're having?

That sounds utterly terrifying

in many ways, obviously,

But, also,

there are other possibilities,

like, could we

share each other's perspectives in more than just a simple visual way? Could we develop a kind of, you know, species wide empathy?

It sounds like

it's super far away, but actually and it probably is in that big Broadway. But there are actually people who have, for example, processed

functional MRI data from somebody who's looking at an image

and then tried to guess what image they're looking at. And the neural nets get pretty close, like, based on the functional information flowing into in your brain from what's flowing into your eyes, they can get, like a posterized

version

of what it is somebody's looking at. So food for thought.

I think that that's also

very interesting in the context of

psychological disorders of being able to gain a better understanding and interpretation

of the brain activity that's happening with people who are undergoing treatment or being able to understand

at a more detailed level some of the characteristics

of neurodivergent

symptoms.

A hundred percent. And already,

people are using AI, I'm sure you know, to

identify,

you know, early onset schizophrenia

or to find

from

just raw video

signs of psychiatric disease that then can be treated. I think in some ways, we are using artificial eyes now to substitute for the expensive and limited eyes of trained clinicians,

and I think it's gonna help a lot of people down the line.

That's also another aspect that's interesting is that a majority

of the imaging technology that we have is designed with the human

as the end consumer where we are biasing towards maybe translating

nonvisual

inputs into some form of visual representation

either by converting mathematical

formulas into a a pictographic representation

or

changing the color distributions or reducing dim dimensionality so that it's interpretable by a human. And as we bring these computer systems more into the workflow, what are some of the ways that maybe we change the ways we think about the capture of that information

because it's never going to be interpreted by a human?

So I'm actually working on a paper with a colleague, Sumit Chopra, right now with the title, Imaging Without Images. In fact, if we have machine learning systems on the other end, you're absolutely right. We don't necessarily need to spit out pristine images. Images are just a really good interface

to our optical pathways

and the associated brain pathways.

But if we've got other types of algorithms, sure. An imaging device can really just present a signature of some sort of fingerprint

of an interesting

object. And that's all that a machine learning system would need to detect it, which again means that a lot of the, I don't know, bells and whistles that go on to a modern imaging

device to make its image nice and crisp and sharp, we can dispense with. Make them cheaper or just really focus on getting really good

fingerprints

of objects of interest. I think that's gonna be a really interesting area of, again, what I call upstream AI.

It also changes the ways

that we think about the trade offs of converting to those images because many of the image formats are in some way lossy where we are throwing away some of the detail of the initial capture

just in order to be able to make it interpretable by the human who is viewing it.

Absolutely. In radiology, actually, we do that all the time. We might take input from 20 different

MRI detectors

that actually is complex, real, and imaginary,

and we distill all of that down

into a single

real image that is then presented to the radiologist.

There's a fair bit of information that we are tossing away there in order to to make that translation. So, yeah, I think at some point remember,

images are just spatially resolved information. If we have a signal that has spatial information in it, that's gonna be enough in many cases.

And as you have been

working in this field of imagery

and

working

as the overall

applications

and sophistication

of ML and AI capabilities have grown, what are some of the most interesting or unexpected or challenging lessons that you've learned personally?

One is certainly the importance of not just sort of plug and play, shove some input into a neural net and, you know, expect that it's gonna get the answer you want. That's been disproven many times in my practical experience.

I think

one of the lessons

that I've learned both in my research and in preparing and writing the book is that there are lots of lessons we can learn from our deep biology.

Things that the brain has evolved over time, tricks basically,

to efficiently

distill information

down to useful representations

that we can really follow. And, by the way, along those lines, I'm a big fan of a 2022

article, I think, by Jan Lukun

called the path towards autonomous machine intelligence,

which lays out this architecture

for developing kind of a world model engine that takes all of these distilled representations

from different tasks

and then farms them out to other tasks and shares them around

just like one might imagine the human brain does.

I think that's, at the moment, a challenge for us to

step back from given tasks and start thinking about building up models of the world. But I also think it's one of the greatest opportunities

that's gonna give us

remarkable new capabilities.

And given all of the potential applications and all of the potential value to be realized through the application of AI in the vision space, what are the situations where you advocate against it and instead defer back to human interpretation and human

expertise? Again, I think these sort of plug and play models, like, I wanna find

a pulmonary nodule. So I've trained a neural net to find a pulmonary nodule on a CT scan. I think there's a real risk if we start relying just on those because we're gonna miss the other important incidental findings that a cross trained

human radiologist

might find. I think really

the places that AI

has adds the best value now, it just in sort of clinical interpretation of images,

is in making sure something is seen. Once again, in kind of raising a flag, hey, this is abnormal. You should really look at it,

rather than just training it on an image and assuming that we're done. Again, I think we we got to avoid the plug and play. We gotta preserve judgment because a lot of medicine is about judgment. And if we can enhance our judgment,

then we're cooking with gas.

What are your

hopes and predictions

as we

barrel into the future and continue to

invest in and explore the capabilities

of AI in the context of imagery and particularly as they move beyond

interpretation and into understanding?

I think one way of looking at it is that the way we see is actually changing, and I think we need to keep our eyes open for that because it can be subtle. But if you look back at the history of imaging, every time we've discovered a new imaging modality, a new probe that brings us spatial information,

inevitably, it has brought societal change, changes to science,

changes to the way we understand ourselves in the world.

Now we've got cameras everywhere.

We've got AI systems that are processing,

digesting those images for us. We're in a situation where you could argue our vision is capable of being hacked. Somebody can start showing us only what we want to see or only what they want us to see. And I think that's a world where we need to be very careful that we preserve a notion of truth. So, basically, my my bottom line message is keep your eyes open because the way we see is changing.

Yeah. There's an interesting anecdote

from, I think, early on when I started working on this podcast where somebody was discussing some of that aspect of the potential for

AI to mass produce imagery

and information.

And

in they were relating a conversation they had where somebody said, oh, well, we should watermark all of the AI generated content so we know that it was artificial. And the response was, no. We should actually watermark all the stuff that humans actually had to play in because that's going to be the minority going forward.

What what a fascinating and terrifying thought, isn't it?

Yeah. Absolutely.

Yeah. I think we are fundamentally visual creatures. We are all creatures of imaging. And in a world where, just like in the early oceans, interestingly enough, there are eyes everywhere.

I think we need to give some thought to

how we want to ingest that visual information. We wanna make sure that it keeps being a a survival benefit

rather than the opposite.

Are there any other aspects

of your work in imaging, the applications of AI in that space, or, any of the other topics that we touched on that we didn't discuss yet that you'd like to cover before we close out the show?

Let's see. I think we actually covered a lot. This was a great conversation. So no. I think I I think we've got the basics.

Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being some of the biggest gaps in tooling technology or human training that's available for AI systems

today? I think

gosh. There's so much

tooling

that has been built.

I guess, I'm gonna go back to one of the things you mentioned earlier in the conversation.

The idea of an agent that can

cross disciplines

and find connections,

that's such a powerful

capability.

I think we've got all of these different algorithms. We've got all of these

archive papers touting the latest.

I'd love something that could distill

what are the five key

methodologies?

What are the 10 key concepts

that underlie all of these different network topologies

and all of these different training

approaches

so that we have a sort of simpler building block set to work from? I'm a physicist, so I like

fundamental concepts.

I'd love to see that as an aid to really understanding what we're doing as we build machine learning systems.

Absolutely. That sounds like a very interesting project for someone to endeavor for. So,

well, thank you very much for taking the time today to join me and share all of your experience and insight into the world of imagery and some of the ways that AI is playing a part in that. So appreciate all of the time and effort that you're putting into helping to drive the industry forward in that regard, and I hope you enjoy the rest of your day. Thank you so much. I've really enjoyed this conversation with you.

Thank Thank you for listening. Don't forget to check out our other shows. The data engineering podcast covers the latest on modern data management, and podcast dot in it covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts@AIengineeringpodcast.com

with your story.

AI Engineering Podcast