Support This Site | If you have enjoyed this site, here are a few ways that you can help me get time to expand it further: |
Use your credit card or PayPal to donate in support of the site. | |
Use this link to Amazon—you pay the same, I get 4%. | |
Learn Thai with my Talking Thai-English-Thai Dictionary app: iOS, Android, Windows. | |
Experience Thailand richly with my Talking Thai-English-Thai Phrasebook app. | |
Visit China easily with my Talking Chinese-English-Chinese Phrasebook app. | |
I co-authored this bilingual cultural guide to Thai-Western romantic relationships. | |
Submit This Site | Like what you see? Help spread the word on social media: | |||
| ||||
Whenever you deal with video on a computer, it is absolutely crucial that you understand a few basic facts about fields. Correctly dealing with fields in software is tricky; it is fundamentally different than dealing with plain ol' graphics images. This document explains many of these basic concepts.
To simplify our presentation, this document will use only 60M field per second systems as examples (i.e. the systems popular in the USA and Japan, which have 60M=60/1.001 fields per second, which by the way is not equal to 59.94). But everything we say here applies equally strongly to the 50 field per second signals popular in Europe and most of the rest of the world.
In this document, we use the following terminology:
Alas, most video systems, are field-based (aka interlaced or interleaved).
Briefly put, instead of storing, say, 30 640x480 images per second, a field-based video system stores 60 640x240 images per second (actually it's 60M but we will omit the M for this section only to help our explanation), and each image has only half of the lines of the complete picture. These "half-height" images are called fields. The first field has one set of picture lines (say, line 0, 2, 4, ...) and the second field has the other set of lines (say, line 1, 3, 5, ...).
Many computer types think of fields as simply a weird way to lay out the lines of a picture in memory. They are more than that.
A field is a set of image data all of which was sampled at the same instant of time.* Each field in a video sequence is sampled at a different time, determined by the video signal's field rate. This temporal difference between all fields, not just fields of different frames, is what makes dealing with fields so tricky.
An illustration. Imagine you have a film camera that can take 60 pictures per second. Say you use that camera to take a picture of a ball whizzing by the field of view. Here are 10 pictures from that sequence:
To make it easier to visualize, we're also assuming that this magical ball changes color as it moves.
The time delay between each picture is a 60th of a second, so this sequence lasts 1/6th of a second.
Now say you take a modern* NTSC (or DV, or D1, or US-standard 1080i HD) video camera and shoot the same sequence. We all know that NTSC video is 60 (actually 60M) fields a second, so you might think that the video camera would record the same as the above. This is incorrect. The video camera does record 60 images per second, but each image consists of only half of the scanlines of the complete picture at a given time, like this:
The data captured by the video camera does not look like this:
and it does not look like this:
In reality,
The harsh reality of video is that in any video sequence, you are missing half of the spatial information for every temporal instant. This is what we mean when we say "video is not frames." In fact, the notion of video as "frames" is something we computer people made up so as not to go insane—but sooner or later, we have to face the fact...
Say you want to take a video sequence which you have recorded (perhaps as uncompressed, or perhaps as JPEG-compressed data) and you want to show a still frame of this sequence. A still frame would require a complete set of spatial information at a single instant of time, and so the data is simply not available to do a still frame correctly. One thing that much of our software does today to deal with this problem (and this is often done without knowledge of the real issue at hand), is to choose two adjacent fields from which to grab each set of lines. This technique has the rather ugly problem shown here:
No matter which pair of fields you choose, the resulting still frame looks quite bad. This artifact, known as "tearing" or "fingering," is an inevitable consequence of putting together an image from bits of images snapped at different times. You wouldn't notice the artifact if the fields whizzed past your eye at field rate, but as soon as you try and do a freeze frame, the effect is highly visible and bad. You also wouldn't notice the artifact if the objects being captured were not moving between fields.
There's another thing about these fingering artifacts which we've often ignored in our software—they are terrible for most compressors. If you are making still frames so that you can pass frame-sized images on to a compressor, you definitely want to avoid tearing at all costs. The compressor will waste lots of bits trying to encode the high-frequency information in the tearing artifacts and fewer bits encoding your actual picture. Depending on what size and quality of compressed image you will end up with, you might even consider just sending every other field to the compressor, rather than trying to create frames that will compress well.
Another possible technique for producing still-frames is to choose some field and double the lines in that field:
As you can see, this looks a little better, but there is an obvious loss of spatial resolution (ie, there's lots of jaggies and vertical blockiness now visible. To some extent, this can be reduced by interpolating adjacent lines in one field to get the lines of the other field:
But there is also a more subtle problem with any technique that uses one field only, which we'll see later.
There are an endless variety of more elaborate tricks you can use to come up with good still frames, all of which come under the heading of "de-interlacing methods." Some of these tricks attempt to use data from both fields in areas of the image that are not moving (so you get high spatial resolution), and double or interpolate lines of one field in areas of the image that are moving (so you get high temporal resolution). Many of the tricks take more than two fields as input. Since the data is simply not available to produce a spatially complete picture for one instant, there is no perfect solution. But depending on why you want the still frame, the extra effort may well be worth it.
LCD video monitors behave similarly: they only update half the pixels of the screen for each field period.
So if that's true, then how come images on a video monitor don't flicker hideously or jump up and down as alternate fields are refreshed?
This is partially explained by the persistence of the phosphors on a CRT screen (or the continuous glowing of pixels on an LCD monitor). Once refreshed, the lines of a given field start to fade out slowly on a CRT (or not at all on an LCD monitor), and so the monitor is still emitting some light from those lines when the lines of the other field are being refreshed. The lack of flicker is also partially explained by a similar persistence in your visual system.
Unfortunately though, these are not the only factors. Much of the reason why you do not perceive flicker on a video screen is that good-looking interlaced video signals themselves have built-in characteristics (most of which came from the camera that captured the video) that reduce the visibility of flicker.
It is important for you, as a video software writer, to understand these characteristics, because when you synthesize images on a computer or process digitized images, you must produce an image that also has these characteristics. An image which looks good on a non-interlaced graphics monitor can easily look abysmal on an interlaced video monitor.
Disclaimer: a complete understanding of when flicker is likely to be perceivable and how to get rid of it requires an in-depth analysis of the properties of the phosphors of a particular monitor (not only their persistence but also their size, overlap, and average viewing distance), it requires more knowledge of the human visual system, and it may also require an in-depth analysis of the source of the video (for example, the persistence, size, and overlap of the CCD elements used in the camera, the shape of the camera's aperture, etc.). This description is only intended to give a general sense of the issues.
Disclaimer 2: standard definition analog video (NTSC and PAL) is fraught with design "features" (bandwidth limitations, etc.) which can introduce many similar artifacts to the ones we are describing here into the final result of video output from a computer. These artifacts are beyond the scope of this document, but are also important to consider when creating data to be converted to an analog video signal. Examples of this would be antialiasing (blurring!) data in a computer to avoid chroma aliasing when the data is converted to analog video.
Here are some of the major gotchas to worry about when creating data for video output:
Since the non-black data is contained on only one line, it will appear in only one field. A video monitor will only update the image of the line 30M times a second, and it will flicker on and off quite visibly.
You can see this yourself if your computer is equipped with a video output that can display the contents of your desktop. Plug in a video monitor, make sure this webpage is visible on the video monitor, and then go into your video hardware's control panel and disable all forms of "flicker filter" (in Windows, this option is often buried inside the Display control panel, Settings tab, Advanced button, inside one of the tabs specific to your hardware vendor, but you may also find it in other Windows control panels).
You do not have to have a long line for this effect to be visible: thin, non-antialiased text exhibits the same objectionable flicker.
Typical video images are more vertically blurry; even where there is a sharp vertical transition (the bottom of an object in sharp focus, for example), the method typical cameras use to capture the image will cause the transition to blur over more than one line. It is often necessary to simulate this blurring when creating synthetic images for video.
These lines would include data in both fields, so part of the line is updated each 60th (technically, 60M) of a second. Unfortunately, when you actually look at the image of this line on a video monitor (again without any "flicker filter"), the line appears to be solid in time, but it appears to jump up and down, as the top and bottom line alternate between being brighter and darker.
There is no particular magic method that will produce flicker-free video. The more you understand about the display devices you care about, and about when the human vision system perceives flicker and when it does not, the better a job you can do at producing a good image.
You might think that you could generate synthetic video by taking the output of a frame-based renderer at 30M frames per second and pulling two fields out of each frame image. This will not work well: the motion in the resulting sequence on an interlaced video monitor will noticeably stutter, due to the fact that the two fields are scanned out at different times, yet represent an image from a single time.
It's exactly parallel to the correct and incorrect interpretations of how a camera records interlaced video from our section above. You should generate 60M temporally distinct fields per second, like this:
You should not try to pull fields out of frames rendered at only 30M temporally distinct moments per second, like this:
Your renderer must know that it is rendering 60M temporally distinct images per second.
When fields enter the picture, things get ugly. Say you are playing a video sequence, and run up against a missing field (the issues we are discussing also come up when you wish to play back video slowly). You wish to keep the playback rate of the video sequence constant, so you have to put some video data in that slot:
which field do you choose? Say you chose to duplicate the previous field, field 2:
You could also try duplicating field 4 or interpolating between 2 and 4. But with all of these methods there is a crucial problem: those fields contain data from a different spatial location than the missing field. If you viewed the resulting video, you would immediately notice that the image visually jumps up and down at this point. This is a large-scale version of the same problem that made the two-pixel-high line jump up and down: your eye is very good at picking up on the vertical "motion" caused by an image being drawn to the lines of one field, then being drawn again one picture line higher, into the lines of the other field. Note that you would see this even if the ball was not in motion.
Ok, so you respond to this by instead choosing to fill in the missing field with the last non-missing field that occupies the same spatial locations:
Now you have a more obvious problem: you are displaying the images temporally out of order. The ball appears to fly down, fly up again for a bit, and then fly down. Clearly, this method is no good for video which contains motion. But for video containing little or no motion, it would work pretty well, and would not suffer the up-and-down jittering of the above approach.
Which of these two methods is best thus depends on the video being used. For general-purpose video where motion is common, you'd be better off using the first technique, the "temporally correct" technique. For certain situations such as computer screen capture or video footage of still scenes, however, you can often get guarantees that the underlying image is not changing, and the second technique, the "spatially correct" technique, is a win.
As with de-interlacing methods, there are tons of more elaborate methods for interpolating fields which use more of the input data. For example, you could interpolate 2 and 4 and then interpolate the result of that vertically to guess at the content of the other field's lines. Depending on the situation, these techniques may or may not be worth the effort.
If you choose the "temporally correct" method and repeatedly output one field (effectively giving you the "line-doubled" look described above), then you get an image with reduced vertical resolution. But you also get another problem: at the instant you pause, the image appears to jump up or down, because your eye picks up on an image being drawn into the lines of one field, and then being drawn one picture line higher or lower, into the lines of another field. Depending on the monitor and other factors, the paused image may appear to jump up and down constantly, or it may only appear to jump when you enter and exit pause.
If you choose the "spatially correct" method and repeatedly output a pair of fields, then if there happened to be any motion at the instant where you paused, you will see that motion happening back and forth, 60M times a second. This can be very distracting.
There are, of course, more elaborate heuristics that can be used to produce good looking pauses. For example, vertically interpolating one field to make another works well for slow-motion, pause, and vari-speed play. In addition, it can be combined with inter-field interpolation for "super slow-mo" effects.
The simplest method is to use your favorite video capture library to capture already-interleaved frames (pairs of fields merged together into frames), and then display each frame on the screen at 30M per second using your favorite graphics library (e.g. glDrawPixels() for OpenGL, DirectDraw blit, etc.).
While this looks okay, it does not look like a video monitor does. A video monitor is interlaced. It scans across the entire screen, refreshing one field at a time, every 60th (actually 60M) of a second. A typical graphics monitor is progressive scan. It scans across the entire screen, refreshing every line of the picture, generally 50, 60, 60M, 72, or 76 times a second. Because graphics monitors are designed to refresh more often, their phosphors have a much shorter persistence than those of a video monitor.
If you viewed a CRT video monitor in slow motion, you'd see a two-part pattern repeating 30M times a second: first you'd see one field's lines light up brightly while the other field is fading out, then a 60th (60M) of a second later, you'd see the other field's lines light up brightly while the first field's lines were fading out, as seen in this diagram:*
If you viewed a CRT graphics monitor running at 60M Hz in slow motion, and ran playback software that used the simple frame-based technique described above, you'd see a full-screen pattern repeating 60M times a second. The entire video image (the lines from both fields) light up and fade out uniformly, as in:*
These differences in the slow-motion view can lead to noticeable differences when viewed at full-rate (keep in mind that our sample animations on the right are just crude approximations to what you'd really see). Some applications demand that preview on the graphics monitor look as much like the actual view on a video monitor as possible, including (especially) the jitter effects associated with using fields incorrectly. In some markets, customers want to avoid having to buy an external video monitor to verify whether or not their images will look ok on an interlaced video device.
Making video on a graphics monitor look like video is no easy task. Essentially, you have to create some software or hardware which will simulate the light which a video monitor would emit using the pixels of a graphics monitor.
Existing solutions to this problem fall into two general categories:
So how do these graphics adapters "convert" the data? It varies by device, but typically the image which is actually displayed on the graphics monitor contains the image data from the field which just came in, on the proper lines for that field. What is displayed on the lines of the other field might be black, or the result of evenly interpolating the adjacent lines from the field which just came in, or a previous field, or some fancier interpolation of these. These options provide various rough approximations to the appearance of an actual video monitor. You would choose black on the theory that the graphics monitor's persistence was close enough to that of a video monitor. You would choose an interpolated signal or the previous field on the theory that the graphics monitor's persistence was not close enough.
Nowadays, nearly every graphics adapter has this video insert "feature," even if the adapter has no video inputs, because it is used to "accelerate" the ubiquitous consumer application of DVD playback.
But, as you might guess from all the quotation marks I used above, this specialized hardware path comes at a very stiff development cost. The main cost is that you must generally engage very picky, vendor-specific, barely documented, specialized APIs to use the feature, and this wastes untold hours of development time. For example, in some cases there are arcane and arbitrary limits on the position of the video window that wreak havoc on GUI design and coding. If you just want to write code to play a DVD or forward live video to the screen unmodified, then there are high-level APIs to make this "easy," but if you want to actually manipulate the video, get ready for intense pain.
Another annoying side effect of the video insert "feature" is that, for many devices, the video pixels that appear on the graphics monitor are not really present in the framebuffer. This means you cannot read them out of the framebuffer with GDI/OpenGL/DirectDraw calls, and screen grab apps don't work. More seriously, it means you can't draw on top of the video (for example, when you pop another window on top, or overlay subtitles). Adapter vendors have come up with a myriad of hacks to work around this limitation, and these hacks add yet another layer of pain for the developer.
The hardware-based video insert "feature" is the de facto solution for live video monitoring and DVD playback, but it is often not worth the pain for more elaborate video applications such as those which edit and process video.
Of course, this method does not simulate the decay of the phosphors that are not being updated on each draw. You can also choose to clear the framebuffer between draws, which would be similar to the black option of the video "insert" feature described above.
There's an old page from the SGI Lurker's Guide called Displaying In-Memory Video Using OpenGL which may still be useful for modern OpenGL graphics adapters.
The answer to this depends on the kind of camera. Modern cameras use CCD arrays, some of which produce a field of data by sampling the light incident on a grid of sensors (throughout a certain exposure time) simultaneously. Therefore all of the pixels of a field are coincident: each pixel is a sampling of the light from a particular part the scene during the same, finite period of time.
Older tube-based cameras (which were distinguished by crusty old names like vidicon and plumbicon) would sample a scene by scanning through that scene in much the same way a video monitor scans across its tube. Therefore, in the fields created by such a camera, none of the data is coincident! Each "pixel" is from a different moment in time. Instead of capturing the crispy images which we presented to you above:
A tube camera would capture an image more like this:
Tube cameras are dinosaurs and are being replaced by CCD-based cameras everywhere. But it is still quite possible that you'd run into one or possibly even be asked to write software to deal with video data from one.
More importantly, it turns out that many modern CCD cameras, including those in smartphones, capture all the pixels on a line simultaneously (unlike scanning tubes) but capture lines separated in time (like scanning tubes). This generates cool rolling shutter / slit scan effects which can be artistic, annoying, or just confusing.
So even in modern data you are likely to see a version of the effects above and have to deal with it in your software.
When considering questions like how to photograph or videotape a computer monitor using a camera, this harsh reality can come into play.
However, because most of the flickering effects in interlaced video are due to local phenomena (ie, the appearance of data on adjacent picture lines), and because the temporal difference between samples on adjacent picture lines is so close to the field period, it is often the case that you don't have to worry about this harsh reality.
Other video systems, such as 720p and 1080/24p, are clearly progressive-scan. A video camera that uses these systems captures all the lines of the image simultaneously.* If you look at the electrical signal being sent over a video cable, you'll see all the lines being transmitted in one chunk, not segmented into two fields.
So far so good.
But it turns out that there is a third class of so-called "segmented field" systems, such as 1080pSF/24. These systems are really progressive scan, because again an entire frame is captured at one instant*, but when the frame is sent over the video cable, the data is segmented into two fields. That is, a piece of digital electronics in the camera buffers up the progressive-scan frame and slices it into two chunks of data for transmission over the wire. An innocent observer may be fooled into thinking that the data in each field is temporally distinct, but it is not.
Why on earth would someone make a video system like this? Ah, compatibility. It turns out that, early in the development of HDTV when VTRs cost USD $100,000, someone figured out that they could re-use a bunch of their existing field-based hardware (i.e. video switches, VTRs, etc.) with the new progressive-scan camera by faking the equipment into thinking that it was manipulating the old system.
While such chicanery will hopefully go the way of the dodo, it is quite possible that you may someday have to write software to deal with these "mislabeled" signals.
For example, so-called "field sequential" video signals have one field for each color basis vector (red field, green field, then blue field, for example) and those fields may or may not be temporarally offset from one another.
Basically, if you can imagine it, somebody (probably spooks) has implemented it! These systems are outside the scope of this document.
Say you have two images in memory, representing two temporally adjacent fields from a video signal, and you want to know how the lines of those fields will weave together when the fields are played on a video monitor. Or vice versa: say you have an already-woven frame in picture line order in memory, and you want to figure out which lines will go out in which field and in what order.
This is the kind of thing you have to figure out every time you synthesize, manipulate, or display field-based video data.
In this diagram, one field is blue and the other field is pink. Each line of the diagram represents one video line of data. The vertical axis (n) represents the address of each line of video in your program's computer memory. So line n=0 is first in memory, followed by line n=1, and so on.
Columns 1a and 2a show your two fields separate (but abutted) in memory. For sanity, we assume that when you store whole fields in memory like this, you always store them in temporal order relative to each other. The T column tells you the temporal position of each line of video relative to the others. Notice how T goes from 0 to 4 in these columns, since each line of a field is already in temporal order, and the two fields are also in temporal order relative to each other.
Columns 1b and 2b show your two fields woven together into picture line order in memory. The S column tells you the spatial position of each line of video, as it gets displayed on a television monitor, relative to the others, where S=0 gets displayed on the top line of a television monitor, S=1 on the next line down, and so on. Notice how S goes from 0 to 4 in these columns, since the lines are in spatial order.
So, by reading the value of S in column 1a/2a, or T in column 1b/2b, you can figure out how to weave or un-weave your data.
That's, well, the tricky bit...
If the video devices or libraries you rely on are properly designed, their API or documentation will just tell you directly. For example, if you're dealing with uncompressed data in QuickTime files, you can read one of the required ImageDescription extensions (which I designed for Apple) to get the answer.
If the answer isn't immediately forthcoming from the API or documentation, this is the point where people tend to give up understanding, and instead they "try it one way" and if that fails, "try it the other way." That method might work if you can guarantee that your video system (e.g. 480i NTSC or 576i PAL), video hardware, drivers, library software, and OS software are going to remain unchanged. I know a lot of people who've burned unbelievable numbers of hours putzing their code back and forth without knowing why the "field sense" keeps flipping around.
But it turns out that it's often not that hard to really get a grasp on what's going on, in such a way that you can predict and control for changes in the software that you rely on, so that your application always interleaves fields correctly.
You can achieve this in two steps:
This is not hard, and it is the exact purpose of this document, which I recommend you check out:
Programmer's Guide to Video Systems
Field dominance is relevant when transferring data in such a way that frame boundaries must be known and preserved. Examples:
Field dominance can be "F1 Dominant" or "F2 Dominant." It determines the meaning of a "frame."
For "F1 Dominant," a "frame" is an F1 field followed by an F2 field. This is the protocol recommended by all of the industry video specifications.
For "F2 Dominant," a "frame" is an F2 field followed by an F1 field. This is the protocol followed by several production houses in Manhattan and elsewhere for the 480i formats only. These production houses are numerous enough, and the miles of old footage captured by them are voluminous enough, that one day you may very well be called upon to handle F2-dominant materials.
What are the cases where the frame boundaries must be known, and thus where field dominance is relevant?
In the next sections, we'll explain what has field dominance, where it begins, what has to worry about it, and why it exists at all.
Put another way,
... | F2 | F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 | ... |
pre-existing tape material | ... | F2 | F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 | ... |
---|---|---|---|---|---|---|---|---|---|---|---|
new material: F1 dominant edit | F1 | F2 | F1 | F2 | F1 | F2 | |||||
new material: F2 dominant edit | F2 | F1 | F2 | F1 | F2 | F1 |
Once you make that choice, then the material on the tape acquires a field dominance. All subsequent edits to that material need to begin on the same field type.
To see why this is so, assume we have edited together material A, B, and C following F1 dominance:
F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 |
A | B | C |
Now say we try and edit in material D to replace material B following F2 dominance:
F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 | |
before | A | B | C | |||||||||||
material D | D | |||||||||||||
after | A | B | D | C |
We have created an unpleasant edit where one field of material B still pops up at the edit point. Field dominance is the protocol which video engineers invented to prevent this problem.
video signal | ... | F2 | F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 | ... |
---|---|---|---|---|---|---|---|---|---|---|---|
frames | ... | frame | frame | frame | frame | ... |
Or do you set your device/library to F2 dominance so that it will group an F2 field followed by an F1 field into a frame:
video signal | ... | F2 | F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 | ... |
---|---|---|---|---|---|---|---|---|---|---|---|
frames | ... | frame | frame | frame | frame | ... |
Once you have made this choice, you have determined the boundaries on which the material may be edited. You have given the material a field dominance.
Even if your device/library is capturing seperate fields (e.g. M-JPEG field images), its behavior may be affected by its setting for field dominance. For example, the first buffer you get from the device/library after beginning the capture may be guaranteed to be the dominant field.
Modern VTRs and switchers offer this option.
Many computer video input/output devices assume F1 dominance, meaning:
Many computer video storage libraries/file formats, such as QuickTime and AVI when used with certain data formats, effectively assume F1 dominance, because they hard-code the spatial/temporal relationships of the lines of an F1 and F2 field (i.e., case 1 or case 2 above) to one possiblity for each video system. This could be considered a feature or a bug :)
Other storage libraries/file formats give you a choice of dominance, although it's sometimes not obvious that the choice you're given relates to field dominance. For example,
Equipment which does not perform edits, such as a video monitor, a waveform monitor, or a vectorscope, does not care about field dominance.
Well, welcome to video! Presented with an arbitrary choice, video engineers are incapable of making the same decision. Some engineers decided that F1 should be the dominant field because the number 1 comes before the number 2. As we will see below, LTC and VITC timecodes are defined so that a new hh:mm:ss:ff occurs at an F1, so this meshes.
Some engineers decided that the dominant field should be the field that includes the top line of a picture (which, in turn, depends on how they sample the video picture!). In some cases this works out to be F2.
Who was the first to make a dominance decision? The Ampex VR1000B 2-inch quad video deck from 1962, on which people did edits by "developing" the tape's magnetic control track into visible marks using a chemical and then splicing the tape at those marks with a razor blade, placed its control track marks every thirtieth of a second at—you guessed it—the beginnings of F2 fields. This greatly predates timecode formats like LTC and VITC. So in some sense F2 dominance is right because it was first.
You might also ask: "Why doesn't the industry just choose and use something now?" The original culprit decks from the sixties created a legacy, in the form of reels and reels of archival material, that was passed on to each new generation of VTR technology as studios transitioned to "the next" equipment. This legacy is still alive; all decks sold today have switchable field dominance, and studios still have material from "the last" equipment with edits on a certain field boundary.
Another, even more grotesque idiosyncrasy of analog video tends to dwarf the field dominance issue anyway: color framing. Edits on older VTRs which did not fall on a 2-frame (NTSC) or 4-frame (PAL) boundary relative to the analog signal's color subcarrier would generate unattractive pops and instabilities in the image at the edit point. Therefore, people were too busy worrying about which 2- or 4- frame boundary they had to edit on to worry about which field they had to edit on.
Modern component digital decks and non-linear editing applications have a small chance of breaking the cycle: they have the ability to edit on arbitrary field boundaries, and they have no color framing idiosyncrasies. Studios might actually start editing on field boundaries, and the dominance issue will finally be dead. Editors will still have to maintain the alternation between field types though, since the fields are spatially distinct.
Regardless of one's choice of field dominance, video generated through 3:2 pulldown can have video frames whose fields are from different film frames, possibly even different scenes of a film. So in a sense, the material has neither dominance, but instead has a repeating 10 field pattern. If 3:2 pulldown generated material is then edited without consideration of the 3:2 sequence, you get a sequence where scene changes occur at completely unpredictable field boundaries. This was a major issue for constant-angular-velocity consumer videodisks mastered from 24 frame per second footage, where customers wanted to achieve rock-solid F1/F2 still frames nomatter where they paused. CAV videodisks actually include a "white flag" marker in the vertical interval on all F1/F2 pairs that may contain scene changes, as a hint to the videodisk player not to pause there!
A LTC codeword is one video frame time long. As specified in ANSI/SMPTE 12M-1986, the start of a LTC codeword always coincides with a certain line near the beginning of an F1 field, regardless of the field dominance of the video material.
A VITC codeword (also defined in 12M) tags along with each video field. The VITC codeword has a "Field Mark" bit which specifies if the field is F1 or F2. A VITC codeword's hh:mm:ss:ff value always increments at a transition from F2 to F1, regardless of the field dominance of the video material.
So if your material is F1 dominant, then LTC and VITC behave as you would expect:
field type | F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 |
---|---|---|---|---|---|---|---|---|---|---|
edits on tape | E | F | G | |||||||
LTC on tape | 01:00:00:00 | 01:00:00:01 | 01:00:00:02 | 01:00:00:03 | 01:00:00:04 | |||||
VITC on tape | F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 |
01:00:00:00 | 01:00:00:01 | 01:00:00:02 | 01:00:00:03 | 01:00:00:04 |
But if your material is F2 dominant, then LTC and VITC straddle your edit points like this:
field type | F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 |
---|---|---|---|---|---|---|---|---|---|---|
edits on tape | ... | H | I | J | ... | |||||
LTC on tape | 01:00:00:00 | 01:00:00:01 | 01:00:00:02 | 01:00:00:03 | 01:00:00:04 | |||||
VITC on tape | F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 |
01:00:00:00 | 01:00:00:01 | 01:00:00:02 | 01:00:00:03 | 01:00:00:04 |
This means that one frame of F2 dominant material consists of two fields with different timecodes!
Film Frames | frame A | frame B | frame C | frame D | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Video Fields | F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 | F1 | F2 |
Video Frames | frame 1 | frame 2 | frame 3 | frame 4 | frame 5 |
This chart tells you which film image to use in order to produce each video field. The resulting video will then contain many fields which are duplicates of other fields in the sequence. It is often very interesting to tag the video stream with information indicating which video fields are redundant, so that agents which operate on that data such as compressors or video processors can avoid wasted effort.
The lurkers guess that it's called 3:2 pulldown because the pattern of fields you get contains sequences of 3 fields followed by 2. Or perhaps it's called that because 3 of the 5 video frames do not end up coinciding with the start of a film frame and 2 do.
"Even and odd" could refer to whether a field's active lines end up as the even scanlines of a picture or the odd scanlines of a picture. In this case, one needs to additionally specify how the scanlines of the picture are numbered (zero-based or one-based), and one must also say which lines of the video signal are considered active, which is not at all something you can assume.
"Even and odd" could refer to the number 1 or 2 in F1 and F2, which is of course a totally different concept that only sometimes maps to the above. This definition seems somewhat more popular.
Support This Site | If you have enjoyed this site, here are a few ways that you can help me get time to expand it further: |
Use your credit card or PayPal to donate in support of the site. | |
Use this link to Amazon—you pay the same, I get 4%. | |
Learn Thai with my Talking Thai-English-Thai Dictionary app: iOS, Android, Windows. | |
Experience Thailand richly with my Talking Thai-English-Thai Phrasebook app. | |
Visit China easily with my Talking Chinese-English-Chinese Phrasebook app. | |
I co-authored this bilingual cultural guide to Thai-Western romantic relationships. | |
Copyright | All text and images copyright 1999-2023 Chris Pirazzi unless otherwise indicated. |