How Big is Video
By Chris Pirazzi.
Here are some handy pre-computed statistics to give you a sense of how big video is.
Conventions for This Summary
- Kilobytes: x kb is x*1024 bytes
- Megabytes: x Mb is x*1024*1024 bytes
- Here are the four major video signal formats which we will deal
with here, with the shorthand names we will use in this summary.
Don't try and use these shorthand names outside this summary---you'll
get in trouble:
See below for some notes about the specs. lines horizontal sampling specification shorthand 525-line square-pixel
ANSI/SMPTE 170M-1994 NTSC non-square-pixel
(Rec. 601 Digital)
ANSI/SMPTE 125M, 259M
525-dig 625-line square-pixel
(Rec. 601 Digital)
How Many Bytes Per Pixel?
There are tons of possible ways of packing video data into memory, described in detail in The Pixel Rosetta Stone: Packings and Colorspaces. These
are the most common:
- If you represent video data with 4:2:2 sampled 8-bit-per-component
YCrCb data, then you get one 8-bit luminance sample per pixel, one
8-bit U chrominance sample per two pixels, and one 8-bit V chrominance
sample per two pixels. This gives you 2 bytes per pixel. In the VL,
this is called VL_PACKING_YVYU_422_8. For more information on what
YCrCb and 4:2:2 mean, see the Rosetta Stone page or "A Technical
Introduction to Digital Video" by Charles A. Poynton (New York: Wiley,
- If you represent video data with 32-bit RGBA or ABGR
quantities (where the A may be a don't care or it may be an alpha
channel, synthesized on the computer), then you get 4 bytes per pixel.
In the VL, this is called VL_PACKING_RGBA_8, VL_PACKING_RGB_8, and
- A Rec. 601 digital video stream actually has 10 bits in each
Y, U, or V sample, not 8. Often our software will only deal with 8 of
the ten bits, as is assumed by VL_PACKING_YVYU_422_8. This sometimes
works ok for video data, but in order to parse some forms of ancillary
data (such as embedded audio data) out of a video stream, you must
bring in all 10 bits of each component. 10 bits is an obnoxious
quantity for computers, so the most common technique is to left-shift
each 10 bit quantity out to 16 bits, resulting in a YCrCb-style capture
with 4 bytes per component, VL_PACKING_YVYU_422_10. This is obviously
quite wasteful of memory (and perhaps disk) bandwidth, but these
disadvantages must be weighed against the cost of the bit twiddling
that would be necessary to manipulate 10-bit packed data on the
How Many Pixels In a Field or Frame?
That question depends on what part of the field or frame you want.
If you want just the visible picture, no VITC, not all of the closed captioning, and no other ancillary data of any kind, then you want the "active" region. For NTSC and PAL, the part of the signal which is "active" is a little ill-defined. We use the same definition adopted by all SGI VL devices (see Definitions: F1/F2, Interleave, Field Dominance, and More for the vertical definitions of active region). For the 525-line 601 digital format, the concept of active region is well-defined: we choose the "Active Video" region from 125M, not including the "Optional Blanking." For the 625-line digital format, we choose the same set of lines as with the 625-analog format.
If you want VITC or other data which lives in the vertical blanking interval (that data is called VANC (vertical ancillary data) when dealing with a 601 digital signal), you have to capture more lines of data than active video. If you have a digital signal, then ancillary data such as audio can also be stuck in the horizontal blank (this is called HANC), so to get this you will have to capture more pixels per line.
What if you want all the data? For the digital formats, this is well-defined: an image which represents every single bit of data transferred over a Rec. 601 digital video link (including the timing reference signals (called EAV and SAV)) is a "full-raster" image. If you really want all the data, you'll also have to capture at 10 bits (which ends up being 4 bytes per pixel due to padding) rather than 8 bits (which is 2 bytes per pixel). Nitpick: for the digital formats, the size of the two fields is not the same: one field has one more line than the other, and lasts one line time longer than the other. We use the average size for the "field" quantities below.
Sorted by video standard:
|x size||y size||total pixels|
How Many Bytes In a Field/Frame?
In order of decreasing size:
How Many Fields/Frames per Second?
The exact field rate of NTSC and 525-line digital video is (60000.0/1001.0) fields per second. This oddity is explained in "A Technical Introduction to Digital Video" by Charles A. Poynton (New
York: Wiley, 1996). The chart below shows rounded figures.
The exact field rate of drop-frame timecode (which is a hack that was invented to get around the bizzarre field rate of 525-line video) is 59.94 fields per second, which is not equal to (60000.0/1001.0). This oddity is explained in "Time Code: A User's Guide" by John Ratcliff (Oxford: Butterworth-Heinemann, 1993).
The exact field rate of PAL and 625-line digital video is 50 fields per second.
|NTSC and 525-dig||29.97||33.3667ms||59.9401||16.6833ms|
|PAL and 625-dig||25||40ms||50||20ms|
How Many Bytes Per Second in Full-Rate Video?
In order of decreasing size:
|total bytes per second|
You may have heard the figure "27 Million Per Second" associated with digital video. All forms of Rec. 601 video (whether 525- or 625-line) are 10-bit signals with a data rate of 27,000,000 Hz. Since it takes four 10-bit words to encode two pixels (two Y samples, one U sample, and one V sample), this means that a full-raster 601 signal has 13,500,000 pixels per second. You can even verify this by multiplying
out the full-raster sizes times the rates:
- (858*525*30000/1001) == (864*625*25) == 13,500,000 pixels per second
A few of these "pixels" are reserved for use as timing reference signals
(EAV and SAV).
The reason you don't see "27MB/sec" as the data rate for 2-bytes-per-pixel full-raster digital video above is that we have defined a MB as (1024*1024) bytes (as the computer memory geeks do) rather than (1000*1000) bytes, as the communications people do. Gotta love standards!
I Want More Detail!
The best place to go for more detail is the original specs, which we named above.
Some notes about those specifications:
- the organization formerly known as the CCIR is now called the ITU.
- for the digital video formats, ITU-R BT.601-4 (commonly referred to
as Rec. 601) defines some basic properties common to both 525- and
625- line digital, regardless of how it is transmitted. Examples of
these properties include pixel sampling rate and color space. Then,
the more specific documents (125M, 259M, 656) define how the data
format defined by Rec. 601 is to be transmitted over various kinds of
links (serial, parallel) with various numbers of lines (525,625).
Rec. 601 was formerly referred to as CCIR 601.
- ITU-R BT.470-3 is the ITU spec formerly known as CCIR Report
624-1. There was a CCIR Recommendatation 470-1, but it just said to
read CCIR Report 624-1. Now we just refer to 470.
The next best place to go is "A Technical Introduction to Digital
Video" by Charles A. Poynton (New York: Wiley, 1996).
"Video Demystified" by Keith Jack (United States: Brooktree, 1993) comes in a very distant third.