Ex-Apple Engineers: AI and the Future of Smartphone Photography

In this exclusive CXOTalk episode 872, former Apple engineers Ziv Attar and Tom Bishop of Glass Imaging discuss how AI and computational photography innovations—from computational photography to neural zoom technology—redefine smartphone cameras.

59:58

Mar 14, 2025
4,744 Views

How is artificial intelligence transforming smartphone photography, and will it finally bridge the gap between mobile devices and professional DSLR cameras? Join Tom Bishop and Ziv Attar, co-founders of Glass Imaging and former Apple engineers behind the groundbreaking iPhone Portrait Mode, as they explore these questions.

In CXOTalk episode 872, Bishop (CTO) and Attar (CEO) share their expertise in computational photography, optics, deep learning, and AI-driven imaging to explain the future of digital imaging. Glass Imaging's neural image processing platform processes raw camera data to improve dramatically image quality, detail, and low-light performance. The discussion explores how neural zoom technology, custom neural networks, and advanced image-processing algorithms can overcome hardware limitations. 

This conversation highlights key industry challenges, the future of mobile imaging technology, the Glass Imaging’s strategic vision to democratize high-quality photography, and valuable advice for aspiring entrepreneurs and technologists.

Episode Participants

Ziv Attar is CEO of Glass Imaging and an imaging expert with 20 years of experience in Optics and Image Processing. Until recently, Ziv worked at Apple, leading various computational photography projects, including the famous Portrait Mode that launched with the iPhone 7+ and its derivatives.

Tom Bishop is the CTO of Glass Imaging and an algorithm expert with 15 years of experience in AI/Deep Learning, Computer Vision, Computational Photography, and Image Processing. From 2013 to 2018, Tom developed core technology at Apple that powers the iPhone’s Portrait Mode.

Michael Krigsman is a globally recognized analyst, strategic advisor, and industry commentator known for his deep business transformation, innovation, and leadership expertise. He has presented at industry events worldwide and written extensively on the reasons for IT failures. His work has been referenced in the media over 1,000 times and in more than 50 books and journal articles; his commentary on technology trends and business strategy reaches a global audience.

Transcript

Michael Krigsman: Are AI-powered smartphones truly ready to replace professional cameras or is that just Silicon Valley hype? Today on CXOTalk episode 872, we're going behind the scenes with two former Apple engineers who created the iconic portrait mode on your iPhone. Now, as founders of GLASS Imaging, Ziv Attar and Tom Bishop are using neural networks to extract stunning detail from smartphone cameras.

Ziv, tell us about GLASS Imaging.

Ziv Attar: We want to revolutionize digital photography, specifically using AI. And just to clarify, it doesn't mean taking images from cameras and making them better. That's something that a lot of people are doing in various industries, including smartphones, drones, and photography in general. Our goal, in the longer term, is actually to drive changes in hardware that are only enabled by AI that we're developing.

For example, you can make smaller cameras, cheaper cameras, but reach higher quality. You can make thinner cameras, if we're talking about smartphones, that fit into this form factor, which is very limited in its size. Also, to enable new types of optics and new types of camera module architectures that give you some type of significant benefit.

Michael Krigsman: Tom, we hear terms like computational photography. Take us behind the scenes. What are we talking about there?

Tom Bishop: In traditional photography, you have a camera and you get an image out at the end. Maybe you can adjust it in an editing program like Photoshop. With computational photography, there's a lot of power that is put into creating that image, typically on the camera or in algorithms that are embedded into the digital processing on the camera.

What that enables is collecting a lot more information, and extracting more information from the scene that's there, to give you a much better quality image. What it's actually doing is typically combining various captures together, pushing the limits on the hardware capabilities, or even designing new kinds of hardware, lenses and sensors, that can benefit from using algorithms to get a better finished image. It's rethinking the whole camera from a computational perspective.

Michael Krigsman: When you talk about something like portrait mode that you were both involved with, where does computational photography fit in? Or more generally, maybe you can just give us a brief primer on how something like portrait mode works. Just as an example.

Ziv Attar: Portrait mode is an example of computational photography because it's not something that existed before we had enough compute power. What portrait mode does is basically take an image, a normal image, a color image, like existed before portrait mode, and it uses multiple cameras to create this. It's called stereo, but basically to create a depth map.

Now when you have a color image, you have another dimension to that image, which is the depth, the distance of each pixel. Obviously, it's not perfectly accurate, but it's good enough to mimic this out-of-focus blur which you have on big cameras. That was the intention with portrait mode, basically to mimic what happens on a big camera.

On big cameras, it happens optically. Optics, when something is not focused, it gets very blurry. But it's also a nice thing. It kind of helps separate the background and foreground. What sets smartphones apart from professional big cameras? Portrait mode was an attempt to bridge that gap and basically give smartphone users the ability to blur the background.

Now back to your question, Michael. You're asking about portrait mode, but also more general about computational photography. Computational photography has been around, I would say, pretty much, or a little bit after, smartphone photography started. You could say that digital cameras are all computational because they have what was called a Bayer pattern on the sensor, and you have to interpolate some data that's not there, and that's compute.

But I would actually call computational photography when you start to do tricks like HDR. When you start to do complicated denoising and multi-frame image fusion, which is happening on any smartphone today and some other products as well. It just means you're using heavy compute to do things.

Now, this is not new. This has been going on for, I would say, let's say 10 years. And of course, every year the chips on phones, computers, laptops, everything, is getting much stronger. You can use more advanced computational photography.

But what we're seeing, that also comes in kind of two phases, is the use of AI, more specifically neural networks, in this space. We started seeing use of AI even on smartphones quite a few good years ago, for example, for things like face detection, segmentation. "This is a sky, this is skin. We want to have a smoother sky. We can be more aggressive with denoising," for example. All these semantic things, but they happen at low resolution.

If you have an image from a phone and you want to say, "Hey, there's a face there," and then you want to do something with the face, you can run a face detector on a thumbnail image. You don't need to have the full resolution. Those things that happen at low resolution, AI that happen at low resolution, started a few good years ago. We're now, in the last year or two, at a point where AI, or the AI engines on phones, are able to actually handle full resolution compute, AI compute, and that opens up possibilities for doing a lot of very innovative and very impactful things like what we're doing. At GLASS Imaging, we'll talk about it a little bit more in detail in a minute.

Tom Bishop: The term computational photography, I think, came out with regard to smartphones maybe five, eight years ago, when they first started fusing together multiple images. Capture a burst of images, and then to reduce noise, and to increase detail, and to maintain sharpness, they can try to fuse together, incorporate information from that sequence of images to get one still photo.

For a while, to many people, that was computational imaging. But things I've mentioned, quite a lot of other use cases there, and things we're now doing with computational photography, where it's really a new paradigm of how do you design the whole system? How do you process with AI? How do you manipulate the hardware even to give you a better image, knowing that you have this computational power available to you?

Michael Krigsman: You've both mentioned hardware and software. Can you drill into that a little bit? And then Tom, maybe you can show us some example images. But first, I just want to tell everybody that you can ask questions. If you're watching on Twitter, pop your questions onto Twitter, onto X, using the hashtag #cxotalk. If you're watching on LinkedIn, just pop your questions into the chat.

Tom, just drill into this a little bit more.

Tom Bishop: When we're talking about use of AI to improve image quality, that's actually something, in the case of a phone, that's running typically on the phone itself. Most of the processing happens on the phone to produce an image. There are some cases coming to the forefront today with generative AI, where there are services available to process images after the fact.

But when you capture an image, the sensor captures raw data, which is really just the light that hits the sensor and the electronic signal that comes from that. There's a lot of processing steps that typically occur to give you the finished image you can view on a screen. Those are perfectly done by software engineers who create algorithms.

Many different blocks go together, try and increase sharpness, reduce noise, get the right color, and so on. Then there's many knobs they have to adjust in order to give a good quality image. What we're doing at GLASS Imaging actually is trying to create a neural network that's an end-to-end AI solution that replaces all of those algorithms with one process that can extract the most quality from a given image. It's adapted to particular cameras as well.

In the traditional image processing sense, all that demosaicing, denoising, sharpening and detail extraction, correcting for lens issues, those are all things that we can do now with AI. And we run on a camera on the smartphone to extract the best possible image.

Michael Krigsman: Check out CXOTalk.com. Subscribe to our newsletter so that you can join our community. We have really awesome shows coming up.

What about the quality of traditional photography using lenses, of course, versus the quality that you're able to get through AI? And would it be correct to say that this is an artificial process as opposed to traditional photography using lenses?

Ziv Attar: I wouldn't say it's artificial. There is interpolation going on, and there has been basically for any color camera that ever existed. Like I mentioned earlier, there is a color pattern on a sensor, even on big traditional Canon, Nikon cameras, which means that in every pixel location you have either red or green or blue information.

It also means that if you're now in a green pixel, it means you don't know what the red value there, and what the blue value is, and you have to guess it. The process traditionally is known to be called demosaicing, which means guessing, interpolating information.

The question is like, do we call this hallucination or guessing? It's on a small scale. On a pixel level, it is making up information. I think this question, Michael, that you bring up, becomes more interesting nowadays with generative AI, because now you're not only able to guess what a value of a pixel is, you can guess what the mouth of a person looks like or the eye or a house or a bird or a flower, and you can make up the whole flower. You don't just guess one pixel.

I think if you're talking about multiple pixels, of replacing multiple pixels, like a chunk of 100 by 100 pixels, let's say there's a flower in there that's purely generative. If you're replacing a few pixels here and there and removing some noise, then I wouldn't go out and call it generative. Although again, it's totally a gray area. It's not black and white like I tried to explain.

Michael Krigsman: There really is this question of reality versus what's made up. And as you said, at one extreme you have generative AI that is concocting something that never existed. At the other extreme, you have traditional photography. How do you make that distinction? And how far do you at GLASS Imaging push the envelope in terms of adding or changing or adapting the original photograph in order to make it better?

Ziv Attar: Whatever better happens to mean. Regarding the better there is. If you're taking a picture of something and you're trying to test some algorithms, let's say that it's a business card you put on the wall, you can actually walk up to that business card or just take a picture with a very high resolution camera so you have a ground truth.

You can judge what you're doing in terms of how good or bad did it interpolate or hallucinate the missing information. We specifically at GLASS Imaging are actually focused on the trying to be as less generative as possible, in the sense of not making up information that's completely not there.

Some other companies or other pieces of software are offering that. You have DALL-E and Sora and all these things. We see it as a post processing. It kind of complements what we're doing. Because even if you have an image and you want to do some generative things on it, like remove skin imperfections or something like that, you're always better off with having the input image with the highest possible resolution and as true as possible to reality.

Then you can go ahead and apply some, let's say, fake effects on it, which is fine. I'm not criticizing it. I think a lot of people love it. But I think it's also important to start with something that as true as possible to reality. And we're very proud that we're able to achieve, let's say, a high confidence level or high comparability with the actual, with reality, with the ground truth.

Maybe I'll say a little bit about how we're doing it, because we talked about what we're doing, but we haven't explained how. When you look at a camera, and it doesn't matter if it's a phone or a drone or a professional camera or a medical device, a camera consists of a lens and a sensor, then a bunch of algorithms.

The more constraints you have on the camera in terms of size and cost and weight and materials that you can use, usually means you'll have more aberration. The aberration means imperfections in the lens. And it means the lens is going to be a little bit blurry. If you're cramped with space like phones or electronics in general, also means you're likely using a very small sensor, which typically means a lot of noise in the image.

If you look at the image that just comes out of the sensor and all these small devices, the image is bad. It's really bad. We look at it internally. But most people never see an image, a raw image from a phone. It's super noisy. It has color noise, color speckles all over the place. It's very not sharp. It has lots of optical aberrations. The most visible ones are usually chromatic aberration.

If you have black and white text, suddenly on the raw image, when you look at it, it has purple and blue and green, but you look at the real paper and say it's black and white. These are grammatical aberrations and noise and color noise. It all adds up to creating a very bad image. The process of correcting all these things is, let's say, algorithms, computational algorithms, additional photography.

Now back to what GLASS Imaging is doing, which is also different than what others have been doing in the last few years. We have, we build special labs that we can take any camera. For example, we'll take like an iPhone, we're showing some iPhone. We took the physical iPhone, we put it in the lab. We have labs that we built dedicated for this.

What the lab does, and it's all, I can't describe the very details, but the lab learns how to characterize the optics and the sensor extremely, an extremely detailed manner. For every pixel, for every angle, for every type of light, for every amount of light, it characterizes the optics and the sensor and the interactions between the two.

Then all this data is collected, terabytes of data. It takes us a few days to collect that. All that is fed into software for training a neural network that trains a neural network. Specifically, if we're talking about smartphones, something that can actually run on a smartphone. Currently we're offering our solutions on Qualcomm chips, but we also have demos on iPhone and other devices.

Once we do it, we have a network that basically learns to take these bad images and create good images, basically remove all the optical imperfections as much as possible, the sensor noise and sensor imperfections. Now, everything I explained now, I was given an example of an iPhone. The camera on the iPhone is, they're trying to make it as good as possible in terms of sharpness, lens aberration, sensor.

But because we can correct these things, that kind of opens up the question, okay, if you can correct all these things, why does Apple or Samsung or any other phone make it? Why do they need to make it perfect, right? And the answer is, they don't.

We can make the lens much worse than they are today, or the camera in general, and still get a good image of it. Then the next question you know that I'm asking myself, okay, why make it bad if you can make it good? And the answers to this are what makes things interesting.

You make it bad, if you can correct it, only to gain something. We're not university, we're not trying to prove something, we're trying to create something tangible that is actually useful for products. Making something bad to gain something, you can gain cost, you can make lenses cheaper, can make them smaller, you can make them support a bigger sensor without going crazy with the number of elements that you have inside the lens. That just opens up lots of, the design space you have now is way bigger than what you had when you needed to make a good lens and a good sensor.

Michael Krigsman: Very interesting. The source of data, correct me if I'm wrong, if I understood this correctly, the source of data is the profiling that you do of the camera, of the hardware. That data then gets fed into your neural network, which then can essentially construct a simulation, a digital twin. Would that be a correct way of saying it?

Tom Bishop: Yeah, essentially there's kind of three steps, I think. One, you characterize the camera. We take any camera, smartphone or other type of camera, we put it in our lab and we do many measurements that completely profile how the lens and the sensor behave in different lighting conditions, different kinds of brightnesses and scenes.

The second step is we take that data and we use that to train a neural network with, in a way, it's a self-supervised process. We don't have any data annotation or labeling going on from an AI point of view. We train a neural network that's dedicated to that device.

The third step is we have to port it to run efficiently on the edge. Most AI today is running in the cloud. You have big data centers, lots of GPUs, lots of power on these mobile devices. You have actually some pretty capable chips today. Qualcomm is on the majority of Android phones today. Google Pixel and Apple iPhone have their own processes that also have some kind of neural processing unit.

You have a CPU, GPU and an NPU, three different systems of which they all have different merits. But for any of these kind of neural networks, the NPU neural processing unit is super capable, very power efficient. We can put that neural network running on that NPU.

That requires some careful software engineering to make it run efficiently. We want to process the 12 or 50 or even 200 megapixels coming out of some of these sensors today in close to real time, meaning you take the picture and you don't notice that there's some background processing going on. And you render that image on the screen and you save it to disk using that neural network, which is taking those bursts of raw images that come in from the sensor and creating one final image file that has all the sharpness and detail and low noise and clarity that you expect.

Michael Krigsman: Your neural network then is closely bound to the specific hardware of the phone, to that particular model, with the characteristics of that particular lens and processor and sensor.

Tom Bishop: I'd say that's one of the advantages, because there are solutions like today available for enhancing images. You'll see some of them in Photoshop, for example, Lightroom. You'll see upsampling websites. You can upload an image and produce an AI generated up sampling.

Those are not specific to the type of camera that's used to take the picture, they can improve somewhat the image quality. And as Ziv was explaining before, there's this continuum of how much is generated versus how much is recovered from the image.

What we're focused on is really trying to recover detail that's encoded in the signal from a sort of information theory or signal processing point of view. There's image content. The image has information that is scrambled up in a way. It's mixed up, it's blurred, it's noisy, it has some uncertainty about it.

There is a true underlying signal, which is the image that comes from the scene. And we're trying to reconstruct that as faithfully as possible using the information that's there, but it's just scrambled up. If you take one of those other approaches, they don't actually do that unscrambling in a way. They just try to generate or create or invent plausible detail that might be there. We're actually compensating for issues of the lens like softness, aberrations, noise, in a sense. Using a profile that's measured in our lab, you're tightly bound to the hardware and you're compensating for deficiencies that may exist in the lens or in the software platform, whatever it might be that's part of that overall mobile system, that phone.

Michael Krigsman: In your lab, you're tightly bound to the hardware, and you're compensating for deficiencies that may exist in the lens, or in the software platform, whatever it may be. That's part of that overall, mobile system, that phone?

Tom Bishop: Any lens, any sensor has deficiencies. It's a measurement of the physical worlds. When you have a sensor, you're collecting photons. When you have a lens, you're trying to focus rays onto small points on the center. I mean, we can get into the physics of it, but it has limits physically, and we try to overcome those as best as possible using software.

Ziv Attar: You ask, or you mentioned that, okay, this neural network that we just described would be tailor made to that specific camera. It has pros and cons and we solved the con. I'll explain how.

The big pro is that it's tailor made to a specific camera so it's able to correct the specific aberrations of that camera and the specific noise of that sensor. That is what is allowing us to achieve the maximum potential quality of that camera, basically maximize the image quality that you can from a given system.

The downside of this approach is that you have to train for each specific camera. Basically, most of the time we spent last few years is actually building a system which is a physical lab and a bunch of software for training and software for controlling the labs. There's some robotics inside.

We concluded a while ago that okay, the way to get the maximum quality is to train every different camera. We need to make it fast. We spend a lot of efforts and resources on making these labs that can take any device with a camera on it or multiple cameras like phones. Within it can be as fast as a few hours and go up to one or two days.

We can capture all the data needed to train the network for a specific device. Then the training takes anything between a few hours and maybe a day or two. The whole process can start on Monday and finish by Thursday.

Just to kind of put things where this stands in terms of without GLASS Imaging, how would you do it? All the phone makers, they design a new hardware for their new generation phone. Apple's now working on iPhone 17 I assume. At some point, several months before the launch, these companies, they get the first hardware prototypes like the phone with the new lens and the new sensor, new everything.

Then they need to tune, it's called tuning, tune the image quality. Typically there's, depend on the company size, but there's anything between a few hundreds to a few thousands of people working for a few good months, maybe even six months on tuning these things right.

When we say four days, it's very fast. Also worth mentioning, those four days that I mentioned, Monday to Thursday, there's almost zero human hours involved. There's maybe 20, 30 minutes setting up the device in the lab. Once it's set up, we close the door and we let it go.

Then the training, obviously, you know, it's Nvidia GPUs crunching numbers, it's not people doing actual work. We put what I want to say, we put tons of efforts on automating the process. This way we can support thousands of different projects at the same time and provide the output, which is a neural network that's designed to run on a specific platform, whether it's a Qualcomm chip or some PC or an Nvidia chip, within a few days. We could support a very high number of customers every week.

Michael Krigsman: We have a question from Kamaru-dean Lawal Is GLASS Imaging hiring?

Ziv Attar: Good question. Yeah, thanks for bringing it up. Yeah, we have a few open roles on our website and we are going to add several more open roles. It's some software roles, some business roles and some machine learning AI roles. Paul, maybe you want to add something.

Tom Bishop: Yeah, optics, computational.

Michael Krigsman: Kamaru-dean, look at their website. Gloria Putnam asks Where do I get one of these GLASS Imaging AI tracksuits? They look sharp.

Michael Krigsman: And here we have a more technical question. Ayush Jamdar asks Do the trained neural networks learn from the specific camera's response function? Camera response function mapping from sensor irradiance to pixel color, therefore replacing entirely the camera pipeline processing at the phone's ISP?

Basically what he's wondering is where do you guys fit in relation to the camera?

Tom Bishop: We can essentially replace the whole ISP image signal processing pipeline that exists today. We can also replace part of it, but we sit at the, as I said, the front end where we take in those raw images directly from the sensor all the way up to producing a finished RGB image.

Typically we say it's a linear RGB image, meaning it's hasn't had any tone response or manipulation of color on top of it, which is more of an aesthetic treatment that every camera or phone or software does today. We leave that up to vendors to apply their own. Or we can provide a solution that gives optimal tone and color. It can all be done end to end. It's flexible.

Michael Krigsman: Would it be correct to say then that you are performing your magic, improving the sharpness, reducing the noise, you're leaving the colors as they are so that the phone platform can then apply their own processing like Samsung does certain things or other manufacturers do other things within their phone software? Is that their end user software as well?

Ziv Attar: That's very correct, Michael. The reason behind it is not necessarily what we can do or what we want to do. If you look at smartphones, some of them have their own, let's say color tendency, warmer looks. This type of White balance, that type of white balance.

If we were selling to 10 smartphone companies, the same post processing pipeline, they will all end up with the exact same colors. And it's one of their ways to differentiate, you know, between, between phones.

Also, it's worth mentioning that these color preferences are very geographically varied. If we're looking for phones that are designed in China and sold in China versus phones are designed for the Korean market or Japanese market versus the US or European market, you'll see very different trends in terms of white balance, exposure, color vibrancy, things like that.

It's also worth mentioning these trends are not fixed. We've seen in the past, let's say five years ago, we've seen that Europe and US devices, mostly like Apple and Samsung, were trying to stick to more realistic looking colors and tones, while the leading phone makers in China were kind of promoting very saturated colors.

Now we actually see, or not now, like the last two, three, as we kind of see some inverse of that. This trend is not constant. With time it changes. It has a very slow cycle, maybe every few three years.

I think this is just my personal opinion. I think the reason to go more true to life is kind of coupled with the quality, the sharpness of the images. I think that the more you're getting closer to, let's say SLR quality or professional quality on images, I think that will actually drive demand for more realistic looking colors and not very overly vibrant saturated colors.

Tom Bishop: A good time to look at some images if you can pull them up. Michael.

Michael Krigsman: Okay, we're looking at an image. We're looking at two images side by side.

Tom Bishop: On the left, these are both images from a recent iPhone. On the left, it's actually the processing straight out of the Apple camera app. And on the right is using our GlassAI neural network processing running on the device.

What you can see immediately and we can pan and zoom around here, but you can see there's better detail, sharpness, definition, color. Like if you zoom in on this roof, you can see the red color is much better represented. You can see the trees are clearer. The definition of the lines in these buildings is much better. The saturation's better. Many, many things to point out.

Let's look at another image here. This is one where actually Ziv, you captured this one. I guess it's zoom in.

Ziv Attar: Yeah, it's the one where we were talking about earlier. Michael, you're talking about hallucination and we talked about how it's gray and not binary. I think this is a good example of how we differ from other, let's say, AI processing.

The image on the left is captured by the default iPhone camera app. This is a lot of zooming in going on here. This object is pretty far from the camera. When you zoom in, zoom in a lot. What you see on the processing done by the iPhone camera is that the text starts to break at some point and it starts to look very weird. This weirdness is actually hallucination.

Obviously, the photons that came from the scene through the lens on the sensor did not look like that. The photons look correct, but it's very hard because there's so much noise in it and the optics are so blurry. It's very hard to separate information from noise and blur because we're training neural networks specifically for this phone.

In this case, it's the iPhone 16 Pro Max telecamera. It's the 5X telecamera. We're able to achieve better reliability in terms of how this matches to the reality or the real text on this box or whatever this thing is. You could say both left and right are hallucination to some extent on the pixel level.

But I think this is a good example to show that the hallucinations on the left extend to maybe 4, 5, 6, 10 pixels, whereas on the right, we're hallucinating only or interpolating only maybe one or two pixels in size. Even though the text on the right does, I wouldn't say it looks good. It's a little bit blurry and smudgy, but you don't see things that should not be there.

I mean, we as humans look at this text and we know it's English letters, and we know what it should look like, right? We're not achieving perfect sharpness, but we're not inventing lines that were not there, which this was to explain the hallucination, or lack of hallucination objective.

Tom Bishop: You might ask, how do we do this? These actually come from these same RAW files that are processed differently in each case. We're actually, like I said, replacing all those blocks in this traditional image signal processing pipeline that are kind of lossy.

Every time you apply a sharpen, denoise, tone curve, HDR, various different steps, they all lose some information in the signal. Whereas you do it end to end with a neural network. You can actually preserve those things.

There's demosaicing, which is you have this Bayer pattern on the sensor where you only see every other green pixel and every 1 in 4 blue and red pixels, you're having to make those up. But the sensor is also noisy, you're trying to guess the missing pixels in the face of actually extreme noise in these tiny cell phone sensors.

That's what leads to these kind of artifacts. When it's done with a handcrafted algorithm, as has been done for the past, since digital cameras have been around, you tend to, those noise artifacts get amplified through those different steps. Whereas on the right, the neural network is actually able to make its best interpretation using the characterization that we did in our lab, to untangle the effects of the noise that you mosaicing and the sensor.

Michael Krigsman: When you talk about the characterization, now that we've seen some examples, and it's obviously very, very impressive, what is the characterization? Can you be more specific? You're talking about aberrations in the lens, limitations on the sensor. And how do you know, how do you know where those aberrations exist? Is it relative to absolute perfection? Is it relative to nothing? Can you give us some calibration sense here?

Ziv Attar: We can give some more information, but not explain exactly how we do it, because that's part of our secret sauce. When the lens goes in the lab, I'll explain methods used by, let's say, traditional. Let's say that, forget about the labs for a second. Let's say that you have a new camera or a phone with a camera on it, and you want to determine what does the optics look like, how is the blur looking like?

You could do a few things, but maybe the most obvious one, you would take some lab with nice lighting and charts, and you would pick a chart that has. There's something called an SFR chart, which is basically just a bunch of black squares on white background.

On these squares, you can go to the edge where it's supposed to go from black to white. You know that it's infinitely sharp. You know the ground truth because you printed that chart and created that chart, or you buy it, but you know how it looks like.

Then you take a picture of that chart with your, let's say in the case of a phone, with a phone, but you just take a raw picture. You don't do any processing on the image. Now you look at this raw image and you look at the, let's say the black and white edge and you see that, okay, it's not actually transitioning straight from black to white. It's actually blurry.

The transition takes maybe three or four or five pixels. You can learn the blur of the lens from that. Then you would also see that this edge is supposed to be either black or white or gray. You see that it has for the transition takes maybe three or four or five pixels. You can learn the blur of the lens from that. Then you would also see that this edge is supposed to be either black or white or gray. You see that it has, for example, some greenish hue on the right side and some magenta color on the left side.

Then you know that you have a color aberration and you can actually even use simple calculations to calculate how much is the color aberration in pixels. You can manually characterize these things that our AI is doing, but it would take you years to characterize one camera because you'd have to do this thing that I just mentioned, every location in the sensor.

Going from corner to corner to the center, you have to do it for all types of light level or light levels and light types, because the noise plays a game here. If you have low light, you have the effect of noise is much more dominant and you have the temperature of the light, which also has an effect.

Now just to make things more explained, the reality is even more complicated because we have, these lenses are not static, they have OIS on them, so they move around. Everything I described now is also dependent on the position of the lens during stabilization, during capture.

You also have an autofocus system on these cameras which tries to focus on your subject, but that's never perfectly accurate. You have some variance there, some slight autofocus, slight mismatch. And you also have object distance. Everything I describe now you need to repeat the whole thing from a few centimeters all the way to several tens of feet.

This can be done manually, but it will just take you, my guess is a year of full-time employee in a lab to just do all these measurements. And then there's also a question, what do you do with all this data? You'll end up with just tons of data. And what do you do with it?

Tom Bishop: Let's look at another example maybe. Michael, can you see the screen here?

Michael Krigsman: Yep. KPMG. Oh, look at that.

Tom Bishop: Yes, this is a building taken from reasonably far away. But on the left, this is another phone, it's an Android phone. But, it has some pretty extreme lens aberrations going on. Exactly what Ziv was talking about.

Zoom in some more. You see the edge transition here is kind of blurry, it's sort of sharp, but sort of not. It's not just soft, but you kind of have a double edge and you have some color fringing. You can see also on these wires here in the roof and on these pipes, you kind of got red and green on either side of these whites and black transitions.

It kind of looks like an old camera actually. And the reason why old cameras look like that because they didn't have very well controlled lenses. Nowadays you have much better manufacturing tolerances and producing lenses.

But still when you try, this is actually an extreme case. You're trying to zoom in maybe 10x or 20x on the camera. Trying to fit that in a cell phone is really difficult. When you combine it with very tiny pixels, you start to see these lens aberrations where the light is diverging in different ways on the sensor, different wavelengths. The red, the green, focus differently, you get chromatic aberrations and other things like that.

This is what you would get with typical processing. But after applying GlassAI, it learns to calibrate and correct for those things with that neural network under all those very conditions that Ziv mentioned: different focus settings, different distances and so on, different parts of the field of view, they all behave differently. It's a huge space of parameters the network has to learn. But the result is you can start to see those very fine details.

Just one more example here. Let's zoom in on this balcony. You can see these very small shiny objects here are actually well resolved. The edge of this towel hanging up here, it's much better.

Michael Krigsman: But Tom, in this image, with all due respect, there's also a plasticky kind of feeling to the towel that's hanging.

Tom Bishop: Sure. This is an extreme zoom case.

Ziv Attar: Or.

Tom Bishop: Something pushing beyond the limits of the actual sensor. There is actually no physical texture detail here. What we're able to do is to correct those edges in this case and give something that's when you zoom out a little bit, I think you'll agree it's crisper, it's clearer. It's not just adding contrast, it's that clarity that the mid-frequency details become a lot crisper and more natural when you zoom in when you see it.

That was 100x image zoomed in pixel level. When you view it on a normal screen, it just looks a lot better.

Michael Krigsman: Yes, it's very clear. We have another question from LinkedIn, and this is from Stef Bishop: What is the relative importance of correction for lens defects compared to coping with noise?

Ziv Attar: If you're in sunny day outside, the importance of correcting the lens is more dominant than correcting the sensor noise. If you're in, I'm saying low light, but just put things in perspective. Anything indoor for phones is low light, it's going to be noisy unless you have some very fancy studio lights in your house. But even any normal house with lights on is already low light.

In that case, I would say that both are as important. Correcting sensor noise and the lens effect. And just to emphasize, again, I know we said a few times, we're not correcting them sequentially. They're being corrected at one pass with one network, which is very important.

I think when you go to extreme low light, like let's say 0.1 lux or something like that, then the noise becomes more dominant. Because if you have a 12-megapixel camera like you have on most phones, and you're now in extreme low light, you would be very thankful to get a good quality one-megapixel image, right?

What it means is that actually correcting like one pixel size aberration is not super important. It's more about correcting noise. But again, back to my statement, if you correct them separately, you're never going to achieve optimal results when you correct them together. Even this extreme low light, it's very important.

Maybe to kind of help explain it, if you have a lot of noise, right, and you have a fancy traditional denoising algorithm, what do you want to do? You basically want to average pixels out, right? But you don't want to average pixels out. When you reach an edge, you have an edge, you want to preserve the edge, right?

Now, the definition of an edge actually depends on the lens. Not just you can define an edge as a big transition, but if you're not taking into account what your lens can do and not do, or the blur of the lens, you're not going to denoise perfectly. You'll denoise, but not as good as you can. Taking these things together as one and using one piece of algorithm, a neural network in our case, to do these things together gives you optimal results.

Tom Bishop: It depends on the character of the lens and the sensor as well, the pixel size versus the amount of aberration you have on the lens also, not just the size of the blur, but the nature. Some lenses have blurs that are easy to correct, some have ones which are harder to correct.

The noise sort of operates at different spatial frequencies. Your low frequencies, your coarse details are easier to correct, and the high frequencies, the very fine details, are very hard to correct. Typically, that's where the noise tends to dominate.

One thing that we haven't talked about that I think is interesting to bring up is this idea of deep optics, which is something else we're working on, which is more in the realm of joint hardware and software design. Given that we have this powerful neural ISP, this GlassAI that we've developed, the software that can correct these lens operations, another thought is like, how do you produce better hardware taking into account of that software capability? That's where you can actually engineer and design that blur, that point spread function of the lens, such that the neural network can extract the maximum detail and it can overcome that trade-off with noise and deep blurring or sharpening.

Michael Krigsman: I understand now that the model that you're building is so very thorough. Basically, you have this very deep model of the various characteristics of the entire system and that's why you're able to make the changes that you make when you process the image. Ultimately, yeah. We have another question now, more on the business side from Twitter.

We have a question from Twitter: If using your system lets a smartphone manufacturer use lower quality, cheaper camera parts, is the combined cost of the cost, the cost reduction, plus your system, less than the need to buy a more expensive phone or a more expensive imaging system?

Ziv Attar: Smartphone cameras started 10, 15 years ago, expensive around $5, and the goal is to bring it down to $1. What actually happened is that the price went up like crazy because there's more cameras now and all of them have OIS and all of them have high resolution sensors. Most phones have at least four cameras, five cameras today, or at least let's say flagship phones.

The bill of material of these cameras can easily reach $50, $100 on some. There was one device even speculated a few years ago that went up to well above 100. But I would say most of them fall within the several of tens of dollars.

What that means, and it's a very good question, you could ask it the other way. Let's say that you don't have constraints on size. If you're a phone maker and you have some quality today, you reach some quality. Let's say you have a score of 70 and you want to reach a score of 90 or 100. Whatever the hundred is, you want to improve a lot.

You have two options, right? One is to use our software, I'm just using as an example. The other option is to pay more for hardware use bigger sensors, bigger, better lenses, better sensors. The amount of money that you would have to pay to create the same gap in quality would be in the order of several tens of dollars.

I'm putting aside the fact that it's not going to fit in a phone, which is a big deal too, but it's going to be several tens of dollars. Whereas, you know, in our case, I'm not discussing, you know, our pricing models, but it's much lower than that, like an order of magnitude lower. The bottom line is, if you're building a camera and you're debating between putting more expensive hardware or using the software coupled with your camera, you'll save a lot of money by using the software and not the hardware.

Tom Bishop: Aside from money, the interesting thing to observe is the trend in smartphone thicknesses over the years. Apple tried to keep them really thin for a long time and then other companies added these bumps that started growing thicker and thicker. Apple eventually followed that trend and now we've maybe reached a limit where these things barely, the flagship Android phones anywhere, they're maybe 14, 15, 16 millimeters thick, they barely fit in your pocket.

I think there's a desire to go back the other way. They've added better and better hardware over the last five, six, seven years. Now they are looking like, well, what do we do? Adding a software solution like ours is actually a way they can make it go thin again. I think these things go in waves. That might be the next trend in smartphones.

Michael Krigsman: Where is all of this going? What is the future of smartphone photography? And oh, by the way, why doesn't Apple just build this in and they have as much money as any company, so why haven't they just invented this and do this?

Tom Bishop: Obviously it's difficult. I've actually been working in this field for probably 20 years now. Obviously many smart people work on these kind of topics, but back when I started doing this, it was taking like a supercomputer to be able to run deconvolution algorithms on, you know, taking weeks, but now we have it running less than a second on a mobile processor.

Doing that amount of engineering to actually make things run efficiently is super difficult. There's a lot of getting everything just right. As we said, you have to capture a lot of data under very many different conditions. If you make something wrong, then you can screw up the image and obviously you don't want to do that.

But besides that, I think these big companies have, I don't know, a traditional thinking when it comes to how these image signal processing pipelines integrated in tunes. They have a lot invested, like thousands of people and many resources and momentum going that direction. It's a little bit classical innovators dilemma that it takes a new mindset to be able to deploy this.

I think now is the time we have the processing power available and we've been actually working for several years to get this working well on these edge devices. But I think it's the right moment for this technology to be deployed.

Michael Krigsman: And where is this all going?

Ziv Attar: There's a short term where it's going and there's long term where it's going short term. I think we'll see more AI used in all phones, not just by GLASS Imaging, by all the phone makers, by the channel chip makers, by the whole industry, drones, security, cameras, everything. And it's happening already.

We'll also specifically see more AI, pixel level AI, which is what we're doing basically. Obviously it will get better over the years and there'll be competition between companies and one will do a little better than the other. I think that's the short term.

I think the more interesting answer is where is this going in the long term? In long term, in three years and above, it might be sooner is actually the change, you can drive a change to the hardware industry based on this AI and I think that's super interesting.

There's lots of new optics technologies, there's metal lenses, diffractive elements, various type of autofocus materials and mechanisms that things that work but are not enabled, that you can't use them. Without this AI or the non-existing AI, this stuff that we're developing with the deep optics that Tom mentioned.

I'll give an example. There was an attempt to use under display cameras for smartphones for the front-facing camera. Some phone makers actually launched it and the cameras were just horrible. The reason for that is because putting a display on top of a camera serves kind of as a diffractive grading and it creates a very messy PSF which is point spread function. The optics just becomes very, very aberrated, but also in a very unpredictable way for a traditional algorithm to fix.

That's something that for us for example is not an issue at all. We would just take that phone with the under-display camera and put it in the lab and we get a perfect image out of it. Just giving a small example of something that's enabled, it's a piece of hardware that's enabled by this technology that we develop and I think you can extend it to cheaper lenses, thinner lenses also.

We will see in the coming years phones with the same sensor size but much smaller thickness of the lens. All these bumps on the phones will be able to get much, much smaller, because 1 or 2 millimeters smaller then you don't need a bump at some point.

It's also worth mentioning that the foldable phones are trending up now. We just came back from Mobile World Congress last week in Barcelona, me and Tom and we saw tons of foldable devices, you know, triple folding, dual folding screens on all sides. These foldable phones are very cool and useful because they give you a big screen, but they have a big problem.

Any given section of the phone is actually very thin. When you design a camera to go in there, you now have half the space or maybe a third of the space that you have in a normal phone. That just puts more constraints on the camera. They're just using worse cameras today.

But if we can give those cameras a big boost in quality enough that customers will say, okay, I'm going to buy this foldable device, even though it has small camera, I'm still getting the good image quality out of it. That's like a kind of second order drive of technology. But just long story short, being able to use this AI to drive changes in hardware will lead to very interesting things.

Tom Bishop: If you could show this image quickly. This is one we took from a drone as well. We applied our technology on not just smartphones but other platforms that have cameras. Zooming in here, these are some friends from a well-known photography website. They came to visit our office and you can see again compared with the one on the left that comes straight out of the drone, we can resolve a lot more detail. We're able to apply our technology to anywhere that has a camera needs processing that runs on the edge, like also wearables like smart glasses and things like that, which are big up-and-coming trend.

Michael Krigsman: We believe that is a very significant difference. It's pretty amazing. Yeah, it's amazing the difference between these two images very quickly. Large camera manufacturers have also been playing with software-based corrections. For example, I believe that Fujifilm and some of their cameras have been doing that, others as well.

Any thoughts on this kind of technology showing up in large cameras? But very quickly, please.

Tom Bishop: I mean, I think they're behind in terms of the processes they have on there. Smartphones typically have better processors today because of higher shipping volumes, but as they start to integrate the same kind of Qualcomm chips for example, then they will be able to run our software and we can develop versions that run on those as well.

Michael Krigsman: Do we lose anything by applying AI to photographs as opposed to just light going through glass on a sensor?

Ziv Attar: You don't lose anything besides processing time and a little bit of battery power. But actually that even is not completely true because computational photography, non-AI computer photography, actually consumes more power than AI, just because Qualcomm and the chip makers, they spend a lot of efforts in making their chips very, very power efficient. And we can run insane amount of calculations on that chip for a whole second and they will consume less power than traditional algorithms running on CPU, GPU for the same amount of time. The answer is no.

Michael Krigsman: From your perspective, looking inside these smartphones, who's got the best camera?

Tom Bishop: The short answer is some of the Chinese Android phones have a lot better hardware. There's flagship phones that are actually more expensive than the iPhone or any pixel or anything like that. But the software, wait till we're shipping, then we'll have the best.

Michael Krigsman: But the image is a result of the combination of the hardware and the software. Would it be accurate to say that merely buying better hardware because you want a better image is an incomplete solution? Because you'd need to look at both.

Ziv Attar: Yeah, you need to look at the total. And there's a website called DXO which they test cameras and they rank them. And if you go to the top of the chart, you will see some phones with more expensive hardware on it. Not surprising, but some of the phones up there actually are not. They have much cheaper hardware just with better software. There is kind of a, like you said, it's a combination of the two. And if you have the best hardware and the best software, you'll get the best image quality, obviously.

Michael Krigsman: And I know that DXO has rated your results very highly. So congratulations on that.

Ziv Attar: Thank you very much.

Michael Krigsman: And with that, we're out of time. A huge thank you to Ziv Attar and Tom Bishop. Thank you both so much for being here.

Ziv Attar: Thank you talking to you and thanks to the audience for sending interesting questions also.

Michael Krigsman: And thank you to everybody who watched, especially you folks who asked such great questions.

Now, before you go. Check out CXO talk.com. Subscribe to our newsletter so that you can join our community. We have really awesome shows coming up. This has been more of a technical deep dive than we usually do, but I thought it's such an important and interesting topic that thought let's go for it.

Let me know in the comments or send a message, smoke signals, whatever you want. What kind of shows from CXO talk do you want to see? So let me know 'cause we are very responsive as far as that goes. Thanks so much everybody. Have a great day and we'll see you again next time.

Published Date: Mar 14, 2025

Author: Michael Krigsman

Episode ID: 872