play silent looping video pause silent looping video
unmute video mute video

It's all about the angle: Your photos, re-composed

April 22, 2026

Marcos Seefelder, Staff Software Engineer, Platforms & Devices, and Pedro Velez, Senior Research Engineer, Google Deepmind

We introduced a new approach for editing images, now live in the Auto frame feature in Google Photos, allowing users to re-imagine photos from a new perspective after they have been taken.

Have you ever looked back at your camera roll and wished you had captured a scene slightly differently? Maybe you wish you had caught a bit more of one side of a face, or positioned the camera slightly lower to get the perfect shot. Perhaps it’s a selfie with a perfect smile, but the wide-angle lens makes you look somewhat unfamiliar. Usually, these are the "almost perfect" shots we settle for, because the moment has passed, and it is not possible to retake the picture.

While cropping and zooming may help, classic image editing tools won’t fix the underlying problem: the image is still showing the scene from a fixed, imperfect perspective. Zooming in doesn't change the parallax, and cropping can't show you what was just outside the frame.

Today we are announcing a new approach to fix scene alignment after a photo was taken. Our method, now available as part of the Auto frame feature in Google Photos, uses machine learning (ML) models to understand the scene and its spatial layout and uses generative AI to imagine the photo from that new perspective. In contrast to classical photo editing, our method interprets a photo as a 3D scene — think of a real moment frozen in time — and change the camera position automatically within that space. To this end, our method keeps what was originally visible and intelligently generates previously hidden content, forming an authentic new perspective of the original scene.

play silent looping video pause silent looping video
unmute video mute video

The new Auto frame feature interprets a standard 2D photo as a 3D scene. By inferring the original camera position from the image's spatial layout, it automatically modifies the angle to reveal a new, authentic perspective of a frozen moment in time.

A new perspective

In contrast to other generative image editing solutions, our method consists of two stages: (1) 3D scene and camera estimation and (2) generative inpainting and retouching. By decoupling 3D estimation from image formation, we can faithfully manipulate the scene in 3D and adjust both camera intrinsics and extrinsics. Further, we utilize ML models to understand scene contents and suggest new camera parameters automatically.

In the first step, we use an internal 3D point map estimation model specifically configured to faithfully reconstruct human bodies and faces to limit reconstruction artifacts that would potentially harm identity preservation. For every pixel of the original image, our model estimates a 3D point representing the visible surface patch, and additionally approximates the focal length of the original camera.

Next, we use classical 3D rendering to generate an estimate of the image as if captured with the altered camera parameters. Importantly, we can modify both the camera pose (position and orientation) and focal length, giving us full control over the image formation process.

However, rendering a 3D point map alone is insufficient: when you move a virtual camera "around" an object, you reveal parts of the background that were never captured by the original lens. Essentially, the point map is an incomplete representation of the scene and rendering it from a new perspective always results in "holes." To fill these areas, we use a generative latent diffusion model to complete and correct the rendered estimate. This model was trained specifically for this task using an internal dataset of image pairs with known camera parameters. During training, we estimate the 3D point map of one image and project it into the camera of the second image. The model then learns to reconstruct the second image from the re-rendered first image. At inference time, we employ classifier guidance with regional scaling to faithfully preserve original content, while allowing the model creative freedom to fill in the blanks.

Reangle-2

An overview of our two-stage editing method. First, a 3D point map estimation model estimates the scene's geometry by using monocular depth to generate a 3D point map, and 2D semantic information to infer the target camera parameters. Second, a generative latent diffusion model completes the composition by filling the hidden background areas and making final adjustments to the novel view revealed by the new camera angle.

A better point of view

To support fully automatic editing, we utilize ML models to detect the position and 3D orientation of the faces of the main subjects. Together with the 3D point map, this semantic information allows us to compute the camera parameters for the ideal framing. This is particularly useful for portraits. Additionally, images captured with wide-angle front cameras often suffer from strong perspective distortion which can make features closest to the lens appear unnaturally large. To this end, our method automatically detects these distortions and adjusts the virtual camera intrinsics to restore natural, flattering proportions, effectively "stepping back" from the subject after the fact.

play silent looping video pause silent looping video
unmute video mute video

Now available in Google Photos

This fully automatic solution is now live in Google Photos as part of Auto frame. It seamlessly enhances portraits by using our 3D-aware image editing tool to process eligible photos that contain people. Users can access the re-composed image, which has an automatically adjusted camera viewpoint, as the second rendition option within the Auto frame candidates, making it a single-action improvement to the photo.

play silent looping video pause silent looping video
unmute video mute video

Now live in Google Photos, this editing tool allows users to easily access automatically re-composed images as a seamless, single-action enhancement within the Auto frame feature.

Acknowledgments

This feature is the result of a collaboration between Google DeepMind and Google Platforms & Devices teams. Key contributors include: Thiemo Alldieck, Marcos Seefelder, Hannah Woods, Pedro Velez, Michael Milne, Bert Le, Navin Sarma, Jasmin Repenning, and Selena Shang. Advisors include: Steven Hickson, Claudio Martella, Irfan Essa, and Alex Rav Acha. Special thanks to: Mike Krainin, Jan Stria, Neal Wadhwa, Amit Raj, Mauro Rego, Kita Boice, Dennis Shtatnov, Yuan Qi, Julian Iseringhausen, Peter Zhizhin, Jiaping Zhao, Andre Araujo, Jana Ehmann, Keng-Sheng Lin, Isalo Montacute, Brandon Ruffin, Reginald Ballesteros, and Andy Radin.

×
×