I’m about to date myself with this post. Back in the 90’s when Video for Windows was just starting to become popular and Quicktime was still in it’s infancy, there was a common problem. I started learning how to work with video on Adobe Premiere 4.0 for Windows. I quickly noticed a problem; a video might start out with the audio in sync but eventually drift out of sync. This and similar problems plagued the video production industry for years and the this blog post is about why this happened.
What was happening was that NTSC video (not PAL) was broadcast at 29.97 frames per second. Video for Windows typically encoded audio at 30 frames per second and as a video progressed, the slightly faster audio would begin to lose sync with the video causing the this discrepancy.
The interesting part is not the design flaw in Microsoft’s software – it is that our video standard was such an odd number. Why not 30 frames per second?
The actual origins of this can be traced back to the Tesla/Edison era but we’ll start at the birth of television with Philo Farnsworth. (Incidentally, he is the namesake of the Futurama character, Professor Farnsworth).
Old-style CRT televisions work by scanning a beam of electrons left-to-right, top-to-bottom in a zigzag pattern to cover the rectangle of the screen. The beam scanned vertically and horizontally at a fixed rate the image was encode as the intensity of the electrons at any given point. A whole picture could be unwound much like a sweater and instead of seeing a christmas tree, you would see just a bunch of seemingly-random colored yarn.
Philo invented the core component of the first video camera, the image scanning tube in the 1920’s. This is what took a projected image and sliced it into the yarn-like output needed to create a picture.
In the 20’s, however, CRT technology was exceedingly primitive and the phosphors (the coating on the screen that glows when hit by electrons) faded quickly. In order to produce a non-flickering image on the screen, you needed to scan the image top-to-bottom many times per second so that while the bottom of the screen was being hit with electrons, the top part hadn’t yet faded away.
This next part is the genius of the NTSC design.
Remember earlier how I said that the horizontal and vertical scanning rates were constant? Well, semiconductors didn’t exist yet so creating a low-cost, portable, and stable frequency to scan at was hard. NTSC brilliantly got around this by using the AC power source as the timer for the vertical scanning rate. In the US, AC power runs at 60 hz.
The NTSC system uses this to change the deflection of the vertical beam. Transmitting 60 frames per second was a lot of data so the engineers decided to scan the even numbered lines first, then the odd numbered lines. (These are called “fields” in the industry.) By splitting the video frame in half, they could 1) use the AC power reference and 2) scan the screen fast enough to prevent a flicker.
Old black and white television ran at 30 Frames / 60 Fields per second. In Europe, their power runs at 50 Hz and their television system (PAL) ran at 25 Frames / 50 Fields per second.
Stay with me – we’re about halfway through explaining this. This system was great until people wanted color television.
The Industry of Radio Engineers devised a positively ingenious system for encoding color video. To explain it, we need to talk about your eyes.
Your eyes are more sensitive to contrast and brightness than they are to color. This means that you are more likely to see changes in brightness than you are shifts in color. The system the IRE came up with was called YIQ. YIQ had three channels for image information – one brightness channel (Y) and 2 color channels (IQ). This allows the brightness information to be processed separately than color information.
The goal of color television was to create a standard that was backwards compatible with B&W televisions. Televisions were expensive luxury items and not supporting the already small base would be a death-blow to the nascent standard.
Here is where the NTSC color system really shines. An NTSC waveform in B&W was just brightness information. The new system included brightness — and two new color channels in the same waveform. This is a complex approach that had some massive advantages.
This new system was backwards compatible; old televisions would read the brightness (luminance) channel as they always had. The color information was encoded in a way that would be ignored. So, a color broadcast would be viewable on a black and white television.
There was one caveat on the backwards compatibility – the viewer had to adjust the vertical hold knob on their television. You see, to support the extra room for color information, the video signal had to be slowed from 60 fields per second to 59.94. That fractional change allows enough room for the color information but was still viewable on B&W televisions using the vertical hold adjustment.
And that, folks, is why our television (pre digital) ran at 29.97 frames per second. And the design flaw in early computer video systems is what caused the video and the audio to play at different rates.
On a related note this caused headaches for new video editors. Keeping track of where you were all of the sudden became a problem. Was position being tracked using drop-frame counting (which assumed 30 frames per second) or non-drop-frame (which keeps track of the actual frames.) This brought a whole new world of pain. Here is a great blog post explaining what what that entailed: http://www.connect.ecuad.ca/~mrose/pdf_documents/timecode.pdf
Of course, our European counterparts didn’t have to worry – their video format didn’t have to change speed when color was introduced so they continued to operate at 25 frames per second.
Image from Wikipedia user Grm wnr https://commons.wikimedia.org/wiki/File:Waveform_monitor.jpg