Chroma Subsampling Explained: Giving a Little to Save a Lot
Have you ever experienced dropped frames or freeze-ups during the playback of a Full HD or 4K movie from a computer, media server, or internet streaming service? Welcome to the club! Those are common bandwidth-related issues that can usually be fixed by upgrading your computer, internet service, HDMI cables—or a mix of all three.
However, if chroma subsampling and data compression algorithms weren't widely used to store the movie data on your Blu-ray and DVD discs, hard drives, and streaming service's mainframe the problem would be far more pervasive and difficult to fix. That's because both of these technologies work together to significantly reduce the bandwidth and processing power needed to show high-res video at fast 24-, 30-, 60-, and even 120-frames per second rates. Without them, only super-fast computers and graphics cards would be able to process and display the hundreds of megabytes-per-second of data generated by an uncompressed 4K UHD HDR movie without dropping frames or freezing the screen image. In addition, the extremely large file sizes associated with an uncompressed Full HD or 4K feature film would add extra hours or even days to movie downloads from streaming services, and quickly fill up the storage space found on media servers and computer drives. Even if your hardware and streaming connection could handle uncompressed 4K material, you'd need several top-of-the-line HDMI 2.1 cables with matching display inputs to funnel that uncompressed data stream from player to display.
Chroma subsampling has been in use since the advent of analog color TV, enabling both color and black & white TV signals to be transmitted simultaneously in the narrow bandwidth allocated for each TV channel, thus insuring backwards compatibility with existing B&W TVs. For analog video, the term "Y'UV" refers to the chroma subsampled color model used worldwide for broadcast TV, where the Y' sub-channel contains non-linear "luma" or luminance information corresponding to perceptual brightness and detail (most of which comes from the green signal), while the U and V sub-channels contain chrominance (or color) information for both red and blue. A second analog model, Y'PbPr, is used primarily for 3-wire component video, where once again the Y' channel contains the luma data (containing a majority of green chrominance and detail info), while the Pb and Pr components contain B-Y (as in, "blue minus Y," or blue minus the luminance" and R-Y ("red minus luminance") component color data. Using a separate signal component for the luminance part of the signal with these two color-difference signals ends up requiring less space than separate R, G, B analog signals that each contain their own luminance component.
The Y'PbPr analog model corresponds more closely to the current digital models YCbCr and Y'Cb'Cr' (luma, chroma: blue, chroma: red). As in analog models, the prime notation (which looks like an apostrophe) in these models indicates that the original R, G, B pixel data recorded by a camera or created in a graphics program has been "subsampled" in a non-linear fashion corresponding more closely to the color and luminance sensitivity of the human eye. Based on how the eye works, the luminance information is most critical to how we see and make out objects, while some color information can be reduced without detriment, resulting in a compressed signal that saves space and is easier to transmit.
In this article, we'll concentrate on the three most widely used levels of chroma subsampling found in the digital Y'Cb'Cr' model, usually expressed as a three part ratio which corresponds to the Y':C'b:C'r values. If you're a videophile you've probably seen these as 4:4:4, 4:2:2, and 4:2:0. The ratio essentially says, for example, "4 parts luma to 2 parts chroma (blue) to 2 parts chroma (red)." The ratio 4:4:4 is actually used to describe "unsampled" R, G, B data—meaning it has equal parts luma, blue chroma, and red chroma. 4:4:4 and RGB are equivalent, as shown by the graphic above.
Chroma subsampling algorithms were designed to take advantage of the human eye's higher sensitivity to details found in the green color spectrum than in red and blue colors. (Possibly the result of humans evolving surrounded by green plants and forests?) That's why the luma component Y' (see the table above) contains a majority of data derived from the green channel, and why the green channel in most digital photos contains more grayscale information than the red or blue channels (photo below). If you turn down the color saturation on a display or projector, the black and white images you see on screen after chroma subsampling are generated almost entirely from the luma channel data containing roughly 70% green, 25% red, and only 5% blue data. Because of this green detail bias in the way our eye sees, chroma subsampling is able to reduce the chrominance information in the red and blue channels of a video frame up to 30% (when using the 4:2:2 level) without a noticeable difference in detail, color, or contrast for most viewers.
On its own, chroma subsampling can reduce the storage size and bandwidth requirements of uncompressed R, G, B video data by 30-50% without significant image quality degradation. Its highest quality 4:2:2 level is virtually lossless compression that is commonly applied to original video material prior to movie editing and color grading to speed up editing and processing, or is actually the format recorded in most pro-sumer camcorders. In professional production environments, chroma subsampling doesn't get applied until after raw RGB video data or 4:4:4 video is edited and saved as a master. Chroma subsampling is then followed by the application of a global compression algorithm including possibly MJPEG, AVC, or HEVC, all of which require far less horsepower and time to perform on 4:2:2 video data than on 4:4:4 data. Together these different forms of data compression can reduce the size of a video file by 5 to 20 times before image quality reaches unacceptable levels.
How Does Chroma Subsampling Work?
4:4:4. Per the definitions in the BT.2100 standard illustrated above, the original RGB data found in every pixel of a video frame is first used to calculate separate Y'C'bC'r values for those pixels. The 4:4:4 illustration shows how that data might be stored in an 8x8 data array before actual subsampling occurs. At this point, there is no change in the space needed to store the video, or in the image quality, color accuracy, or color detail.
4:2:2. Considered the highest quality level of subsampling, 4:2:2 maintains all of the information in the luma Y' channel. (All of the other levels described below maintain full luma information, while only a few recording devices subsample the luma channel to 3 or 2.) The C'b and C'r values, however, are sampled at half the horizontal rate of the luma channel, so every other pixel in each line of a pixel array is stored without C'b and C'r information, saving approximately 30% bandwidth and storage space. When this video data is opened by a computer program, media player, or display that's able to process 4:2:2 video directly, all of the data available is first de-compressed, and then the missing C'b and C'r data spots are filled in by data from adjacent horizontal pixels (also called interpolation).
4:2:0. Now it gets a bit trickier. At the 4:2:0 level, chroma subsampling allows the luma component to remain unsampled as before, but C'b and C'r are sampled at 1/2 their horizontal and 1/2 their vertical rate for approximately 50% reduction in bandwidth and storage requirements. However, to improve quality, the data stored for each remaining pixel can contain a composite of the data that was deleted (see illustration), using two methods that result in fewer artifacts and a smoother transition between line edges and fine details when the image is reformed by the player. Of course, smoother line edges mean that text on a contrasty background might look fuzzy or unfocused, but this is also an artifact that may be visible when comparing 4:2:2 to 4:4:4 video on a sharp computer monitor.
The Bottom Line on Chroma Subsampling
Does using 4:2:0 subsampling significantly degrade image quality for movie viewing versus using 4:2:2? Not according to most viewers who've enjoyed any number of 4K UHD SDR and HDR Blu-ray movies. That's because 4:2:0 is actually the chroma subsampling level required by the 4K UHD Blu-ray standard (along with the incredibly efficient HEVC global compression algorithm) and is actually used to store just about every movie you'll watch from your cable TV or streaming provider. Before playback, a typical 4K UHD Blu-ray player or media server checks the HDMI EDID information stored in the display or projector in order to determine whether that display can directly handle 4:2:2 video. Most displays can handle 4:2:0 or 4:2:2 video, so the player usually upscales the disc data to 4:2:2 before sending it to the display. If the display isn't 4:2:2 compatible, the player may send 4:2:0 data or upscale to 4:4:4 before sending the video signal to the display.
Based on that little tidbit of info, most 4K UHD Blu-ray players are actually capable of displaying slightly higher quality or more color accurate movies than you can get from a 4K UHD Blu-ray disc! All that's needed is a 4K camcorder, DSLR, or graphics program that records 10 to 12-bits-per-pixel video using 4:2:2 (not 4:2:0) sampling, or even a raw 4:4:4 video format. Once these movies are edited and saved in a format such as Apple's ProRes 4:2:2 and then compressed using 10-bit HEVC, they can be stored on a USB 3.0 drive and inserted into the 4K UHD player's USB port. Or you might by-pass the player altogether and stick the same USB drive into the USB 3.0 port on your display or projector (if it has one). Since the player or display doesn't have to interpolate from 4:2:0 to 4:2:2, there should be more details maintained in fine color gradations.
Either way, the fact remains that some form of chroma subsampling and data compression is necessary to allow current players and displays to handle high res 4K video—and you can rest assured those technologies will be even more important when 8K content and displays are in vogue, especially considering storage and streaming limitations. Fortunately, due to the increased pixel density found in 8K video and displays, both 4:2:0 and 4:2:2 video will look even better than they do now, even when sharp text on a contrasting background is displayed.
Michael J. McNamara is the former Executive Technology Editor of Popular Photography magazine and a renowned expert on digital capture, storage, and display technologies. He is also an award-winning photographer and videographer, and the owner of In-Depth Focus Labs in Hopewell Junction, NY.