Digital image and video stuff


Image memory representation

image layout

  • Multi-Planar/single-planar
In the case of planar data, such as YUV420, linesize[i] contains stride for the i-th plane.

For example, for frame 640x480 data[0] contains pointer to Y component, data[1] and data[2] contains pointers to U and V planes.
In this case, linesize[0] == 640, linesize[1] == linesize[2] == 320 (because the U and V planes is less than Y plane half)

In the case of pixel data (RGB24), there is only one plane (data[0]) and linesize[0] == width * channels (640 * 3 for RGB24)

Credit: Stackoverflow



Picture vs Frame vs Slice

Picture := Frame | Field
Frame := a complete image
Field := set of odd-numbered or even-numbered scan lines composing a partial image.

Slice := spatially distinct region of a frame that is encoded separately from any other region in the same frame.

Credit: wikipedia

H264/AVC bitstreams format/hierarchy of layers



  • VLC NAL unit vs non-VLC NAL unit
VCL NAL units := that contain encoded data of video pictures
non-VLC NAL units := that contain any associated additional information (such as SPS, PPS, SEI)
  • Parameter sets
sequence parameter sets (SPS) := which apply to a series of consecutive coded video pictures called a coded video sequences.
picture parameter sets (PPS) := which apply to the decoding of one or more individual pictures within a coded video sequence
  • Access unit
Access unit := A set of NAL units for a coded frame or field in a specified form/order.
(The decoding of each access unit results in one decoded picture)


  • Coded Video Sequences
    • Series of access units that are sequential in the NAL unit stream and use only one SPS.
    • Start with instantaneous decoding refresh (IDR) access unit. All following video frames or fields are coded as slices


Bitstreams format: AnnexB (h264_mp4toannexb), AVCC

RTP Payload Format for H.264 Video