IMAGE/VIDEO COMPRESSION

As the demand for more life-like images increases, the demand for higher color depth and resolution also increases. This means that devices, such as cameras and screens, and software, such as WATCHOUT, need to invent ways to efficiently handle a larger amount of data. One way to tackle this problem is through image/video compression, which involves strategies to efficiently store the pixels of an image or the frames of a video. When working with compression there are two important concepts:

  • Encoding which refers to the process of transforming color data from its original form to a more storage-efficient form.
  • Decoding which refers to the process of transforming encoded color data into another form.

During encoding the accuracy of the data may decrease. This depends on which compression algorithm is used. If the algorithm is lossless, it means that no accuracy is being sacrificed. If the algorithm is lossy, it means it may remove data to pack the pixels more efficiently.

CHROMA SUBSAMPLING

The human eye is much better at perceiving variations in luminance compared to chrominance. Chroma subsampling is a compression technique that takes advantage of this fact by dividing the color information into two distinct parts, one to store the luminance of the color and another one for storing chrominance of the color.

The luminance part is usually denoted Y' where the apostrophe implies that the luma has been gamma encoded. The chrominance part, which actually consists of two components, is usually denoted CbCr and stores chroma variations.

There are different subsampling schemes and they are usually described by using a three-part ratio a:b:c (a fourth component may also exist if opacity is to be encoded), which defines how an image region is encoded.

  • a stands for the width in pixels of the region. It is usually 4.

  • b stands for the number of chrominance samples (CbCr) in the first row of the a pixels.

  • c stands for the number of chrominance changes between the first and the second row.

    • The chroma subsampling region always consists of two rows.
  • 4:4:4 indicates that each individual pixel in the block contains a unique color meaning that there is no data loss.

  • 4:2:2 is just like 4:2:0 with the exception that two samples for chrominance changes are stored between the first and the second row.

  • 4:2:0 indicates a region that is 4 pixels wide and for each of those pixels two chrominance samples are stored. There are no samples stored to express changes in chrominance between the first and the second row.

  • 4:1:1 indicates a region that is 4 pixels wide and for each of those pixels one chrominance sample is stored, and 1 chrominance change sample in the first and the second row.

Widely used chroma subsampling formats

Chroma subsampling formats (License)

TEMPORAL COMPRESSION

Temporal compression looks at information between frames where consecutive frames are often very similar to each other, especially in scenes with no or gradual movement.

The video compression algorithms analyze the movement between consecutive frames to identify areas that have shifted or changed. Once identified, motion compensation predicts the current frame based on the previous frame(s) and temporal compression typically involves compressing the differences between frames rather than each frame individually.

There are different types of frames in a compressed video. There are I-frames which are complete frames meaning that all the information or pixels are there. These frames are also called keyframes, and they are encoded independently without reference to other frames. An I-frame is usually followed by one or more P-frames that store the pixels that have changed. There may also be frames called B-frames which is a special type of P-frame that looks both forward and backward for changes.

The distance between I-frames is called a Group of Pictures (GOP) A GOP consists of a sequence of frames including a keyframe (or intra-frame) followed by a series of predicted frames (inter-frames). A video from YouTube might have a GOP of 300 frames.

Media servers need an I-frame which is why, when jumping into such videos, it can take a while to get a full image and you even see the video step back.