HDR+ Burst Processing Pipeline

Visual Computing Systems Final Project
Tim Brooks and Suhaas Reddy

Summary

For our final Visual Computing Systems project we implemented a burst photography pipeline based on Google’s HDR+. The technique combines multiple, underexposed raw frames as a means of noise removal, and later applies tone mapping to maintain local contrast while brightening shadows. Initially underexposing images allows for more robust alignment and merging, in addition to lower motion blur and fewer blown highlights. Our code is avaiable at github.com/timothybrooks/hdr-plus.

Our implementation applies the same high level pipeline to raw images off a Canon 5D III DSLR, with modified algorithms. While Google’s HDR+ pipeline has proven a useful technique in mobile photography due to it’s ease (no user input) and robustness (used as a default setting), the main question we explored is whether this technique is potentially beneficial in a broader photography setting. The extreme low-light scene below of a car driving by in the snow is processed soley using our implementation of the HDR+ pipeline.

Example Output Image

The pipeline is broken into three main phases: the first aligns many raw, underexposed frames, the second merges the frames as a means of noise removal and the third finishes the image by applying tone mapping and standard image processing operations. We will briefly explain our approach to each phase, examine sample results, and discuss insights of the results on the potential use of HDR+ in a broader photography context.

Background

When a camera captures information off an image sensor, a series of steps—refered to as a pipeline— must be applied to the data before it will be recognizable as a photograph. While it is common for cameras to apply an image processing pipeline immediately, producing a jpeg output image, some cameras allow for capture of images in a "raw" format, defering processing to that of a software application such as Adobe Photoshop or Lightroom. The pipeline that we describe processes images from start to finish—that is, from raw data to a recognizable image. The closeup of a toy below compares raw sensor data with its corresponding output image.

Raw Sensor Data | source: graphics.cs.cmu.edu

Output Image | source: graphics.cs.cmu.edu

Aligning

We align raw frames hierarchaly via a Gaussian pyramid, moving from coarse to more fine alignments. Each level of the pyramid is downsampled by a factor of 4. Using 16 x 16 tiles and a search region of 4 pixels, we find the tile offset that minimizes the sum of L1 distances. While Google uses slightly different downsample and search region sizes and a mix of L2 and L1 distances, the pixel-level alignment is otherwise identical. Google supplements the hierarchal alignment described above with a sub-pixel alignment method that fits a bivariate polynomial to the pixel-level alignment, allowing for alignment at a finer granularity. By aligning the frames prior to merging, we are able to combine more robustly and without significant blurring or ghosting due to motion.

Six-frame Average

Six-frame Align and Merge

Merging

Merging decreases noise from a reference frame by including similar information from aligned images of other frames. Google uses a variant of the Wiener filter to each tile across the temporal dimension. Our algorithm is significantly simplified, using the same intuition behind non-local means, which weights pixels in relation to the similarity of patches. We combine each tile based on the relative pairwise similarity between a reference tile and other tiles in the temporal stack. As the close crops below display, merging significantly improves image quality by removing noise, without the blurring side-effect of most single-frame blurring algorithms. Merging also improves color accuracy, particularly in darker regions of the image.

Single Frame

Six-frame Align and Merge

Single Frame Closeup

Six-frame Align and Merge Closeup

To avoid artifacts at the boundaries of tiles, we overlap tiles by one half in each dimension, and blend the overlapping tiles using a modified raised cosine window. This form of spatial merging is exact with that of Google’s implementation.

Raised Cosine Window on Checkerboard

Finishing

We implemented black-level subtraction, white balancing, demosaicking with simplified gradient correction, bilinear chroma denoising, sRGB color correction, tone mapping, gamma correction, global contrast adjustment and unsharp mask sharpening. These steps include all of those in the Google pipeline except chromatic aberration correction, dehazing and dithering.

To apply tone mapping, we simulate a more brightly exposed image and weight the pixels of the brighter and darker images according to a normal distribution--here the normal distribution represents the ideal pixel value distribution of a well-exposed image. The weights are then applied to the two images using a laplacian pyramid, which prevents hard edges and haloing around transitions between dark and bright portions of the scene. Differing slightly from Google’s approach, we apply this algorithm iteratively on high-contrast scenes in order to increase the amount of compression, while minimizing blending artifacts.

No Tone Mapping

Tone Mapping

Performance

Our pipeline takes ~3.4 to merge eight raw images on a quad-core 2.2 GHz Intel i7. This excludes an I/O overhead of ~1 second per raw image. Our runtimes have slight variance on image complexity (align and merge), amount dynamic range (tone mapping iterations) and level of noise (chroma denoising strength). The runtimes meet our ~4 second usability goal. A direct comparison to Google's pipeline performance is not particularly telling, as the algorithms and underlying hardware vary greatly.

Ghosting Artifacts

In rare cases, we exhibited ghosting artifacts in our merging. These artifacts occurred in very highly underexposed raw images, and are likely due to a failure to differentiate between noise and misalignment in our simplified merge method. Please note that our testing is not nearly extensive, as we only ran the pipeline on a batch of ~35 bursts that we took.

Highly Underexposed Raw

Resulting Ghosting

Image Quality

Below, our pipeline output is compared with a single underexposed frame processed through dcaw, which is a barebones raw processing pipeline, and a single underexposed frame using automatic Adobe Lightroom processing.

Our Pipeline

Dcraw

Lightroom

Our Pipeline Closeup 1

Dcraw Closeup 1

Lightroom Closeup 1

Our Pipeline Closeup 2

Dcraw Closeup 2

Lightroom Closeup 2

Our Pipeline Closeup 3

Dcraw Closeup 3

Lightroom Closeup 3

We can see that the dynamic range compression of our pipeline produces favorable global lighting. Note that this is less relevant for scenes that do not have a high dynamic range. The detail is undoubtedly superior to the basic processing of dcraw. The noise removal of Adobe Lightroom outperforms that of ours in extremely noisy, low-light cases. However, it comes at the cost of lower saturation in shadowed regions, and generally less color detail. Since our noise reduction is primarily accomplished by merging frames, we can less aggressively apply spatial chroma denoising, producing higher quality color information in nearly all of our results.

Discussion

Google has proven that the burst photography method in HDR+ works well in the mobile photography context. Their implementation is robust enough to be used as the default camera setting, and resulting image quality has received rave reviews. While the technique has not yet been adopted by any mainstream image processing application or DSLR camera manufacturer, we believe that the technique has a large future for photography in a broader context.

Our implementation shows that the pipeline scales well to a DSLR on basic examples. While our results are less robust in terms of preventing ghosting, this may be due solely to our algorithmic simplifications. This does raise other questions about the potential robustness of an HDR+ pipeline for DSLR cameras. While mobile cameras have fixed hardware and image settings, a DSLR purposefully does not. The HDR+ pipeline may fail in some common DSLR use cases, such as studio strobe lighting and extreme action photography. In both cases, retrieving a burst of similar frames may be infeasible.

Another difficulty that may arise in a professional photography scenario is that HDR+ currently removes a level of control from the photographer. Some photographers may prefer absolute control over merging and tone mapping strengths. We suspect that it is therefore unlikely that HDR+ becomes a default setting for high end DSLRs, but see potential benefit for intermediate point-and-shoot cameras and as an optional feature in DSLRs.

There is also a large potential in implementation of this pipeline for raw processing in mainstream photo editing software, such as Adobe Lightroom and Photoshop. While it may be unintuitive at first for photographers to deliberately underexpose images for this use, HDR+ processing outside of the camera could have many benefits, such as greater user control, and more performant hardware, which would allow for more expensive, state-of-the-art algorithms for tasks such as demosaicking and chroma noise removal.

In combination with a DSLR, we could see HDR+ propelling image quality forward for specific professional photography use cases, such as extreme low-light or high-contrast scenes with subject motion.

Ideal HDR+ Use Case

Resources

Burst photography for high dynamic range and low-light imaging on mobile cameras [Hasinoff et al., 2016]
Local Laplacian Filters: Edge-aware Image Processing with a Laplacian Pyramid [Paris et al., 2011]
Exposure Fusion [Mertens et al., 2007]
High-Quality Linear Interpolation For Demosaicing of Bayer-Patterned Color Images [Malvar et al., 2016]
A Simple Yet Effective Improvement to the Bilateral Filter for Image Denoising [Rithwik and Chaudhury, 2015]
Dcraw 9.27 [Coffin, 2016]
Lightroom 6.7 [Adobe Inc., 2016]