3D aspects of the 3D stereoscopic Tvids
What are the intended uses of the 3D stereoscopic TestVid test sequences?
It is anticipated that the 3D TestVid clips will be used for various applications, principally:
- Technical testing of 3D encoders - e.g. efficiency and speed with encoding Left and Right separately or differentially; effects on the encoder of differences Left to Right (differences not only of viewpoint caused by different amounts of 3D effect but also of color, geometry, artefacts)
- Investigation of 3D - e.g. understanding what will cause problems in different encoding, transmission and usage scenarios, user perceptions, limits on acceptability of both the source material and encoded material
Also, although mainly generated for use in a 'testing environment' with the screen sizes and viewer distances as given in the user manual for each set, it is intended that at least some of these sequences are usable with larger or smaller screen sizes / viewer distances (which in the nominal 'test environment' may give either an excessive or a very minor 3D effect).
Why not just shoot my own 3D stereoscopic test video?
Time and money are the short answers.
3D filming really is 10x harder, more time-consuming and costly than 'normal' 2D filming. There are literally so many more dimensions to consider.
Yes, you could go out and do all the filming we did - with a minimum team of 4, but more realistically 6 or more, then the editing and documentation. Day time, evening, night time, spring, summer, winter, lots of locations.
Spend many days planning where to film, then planning each shot to get good coverage of lots of different subjects, global and subject motion, colours, features, etc. Plus get the permissions and pay the fees to film in the various locations.
Then either spend 18 months making a good quality 3D set-up, or spend a fortune to hire the larger cameras and rigs (and have a 10-man team to run it all) and travel to the locations.
Then spend weeks and weeks editing the clips to get a good representative set and range of clips and features as per our list. Then produce them in the correct final uncompressed formats.
Then document them all so you can easily select them. (see extract from the user manual T3D003_Europe).
In all, it took 4-6 man months per clip 3D set, plus the time to write & test the software applications provided. So without doubt buying TestVid clip sets is a lot more cost-effective - and you can get them right now.
What do the 3D stereoscopic TestVid clips test?
The TestVid clip sets comprise a range of subjects, motion, colors, light levels designed to test and stress 3D video encoders by providing a varied set of conditions:
- Subject types such as people, traffic, buildings, sky, water, trees, text..
- Movement types such as panning, tracking, hand-held camera, zooming in/out
- Subject motion such as into, out of or across the picture, in front of and partially behind objects, fast and slow
- Lighting conditions, from bright sunlight, dull daylight, shaded areas, night-time..
- Hard to encode items such as reflections, fine lines, patterns, round objects..
- Varying camera properties such as depth of field, in/out-of-focus..
- And with sound associated with the clips
The '3D effect' varies from mild to excessive, and in particular tests the 3D aspects of:
- Different amounts of continual negative disparity (i.e. out of the screen towards the viewer, in front of the screen plane) and positive disparity (i.e. into the screen away from the viewer, behind the screen plane); sometimes varying within a scene
- Different amounts of temporary (short-term) negative and positive disparity
- Matched and unmatched color between Left & Right streams
- Matched and unmatched geometric properties between Left & Right streams (e.g. due to lens differences)
- Edge of frame artifacts
- Different interocular (interaxial) distances
- Various depths and roundnesses
- Variations in brightness, color and contrast
- Rotational movement, e.g. with hand-held camera
- Panning and horizontal motion (tracking)
- Slight differences on zoom
- Slightly different focus left to right
- Different elements left to right e.g. lens flare or water highlights which only appear in one eye
- Slightly different sync (fully or not fully genlocked)
Due to the above, in many cases the video is harder to encode than might normally be expected, as the lighting conditions are not ideal or there is significant camera movement, or the focus varies, or the disparity is larger than is normally comfortable. These features are deliberately used as they often cause the most difficulty to 3D video encoders and represent the worst case that the encoder should encounter in 'normal / real' use.
What determined the subject choices used in the 3D filming?
The subject choices are different to those normally made compared with 2D TestVid clip sets, where the choice is based upon varied content which tests encoders.
For 3D, the primary motive has been to select subjects where:
- The 3D effects are clear (although ranging from subtle to very pronounced)
- It is considered that the sequences would be a good test of a 3D encoder, either due to the detail/nature of the subjects (e.g. fine lines, water) or due to the differences between Left and Right
- Particular aspects or problems of 3D are illustrated, e.g. objects which appear in one side but not the other at the screen edge; specular highlights in one side but not the other; grain which will be different Left to Right
- In some cases where the difficulties of 3D filming and viewing are illustrated by examples, such as with zoom or hand-held camera action (encompassing angled views)
Consequently several of the sequences are filmed in the same general locations, where clear 3D depth effects could be demonstrated.
There are scenes with fades but why are there no scenes with cuts / composite sequences?
Although some sequences have fades/transitions within them, fast scene changes (i.e. scene cuts) are not provided within the set of clips as they are easy to do simply by adding two of the YUV files together.
One way to do this is using the DOS command window:
copy /b file1.yuv+file2.yuv file12.yuv
(where file1.yuv and file2.yuv are the two files to be added together, and file12.yuv is the result)
This makes a combined file 'file12.yuv' with a scene cut at the join between the two. (This works as there are no headers on the YUV files.)
The YUV files being added together must be the same resolution, although they can be different frame rates.
The advantages with adding files together in this manner are that:
- It allows composite sequences which either contain fairly similar scenes, so that the resulting scene cut is more 'gentle', or completely different scenes, depending upon how radical a scene cut you wish to have;
- Several scenes can be added together to make composite sequences with multiple different levels of scene cuts (from gentle to radical);
- And looping or very long composite sequences can be generated if required, e.g. to play continuously for an hour or more.
Were the scenes all filmed with the same cameras?
All filming was done with pairs of cameras and configurations that were nominally identical (camera sensors, processors and acquisition systems where serial numbers were very close in sequence), although different camera pairs were used for filming various scenes.
What cameras were used?
TestVid has spent more than 18 months developing hardware and software for filming and processing 3D video; this has included custom modifications of off-the-shelf cameras and use of camera sensors with our own custom hardware for video capture and compression at high bit rates. Due to the investment involved and custom nature of the hardware/software, TestVid is unable to provide a straightforward answer, other than:
- Most sequences are filmed using cameras sensors of resolution 3.8k x 2k pixels
- There is not a direct comparison, but the 'equivalent' data rate for recording video is approximately 500Mbits/sec I-frame for each of Left and Right
How well were the cameras mechanically aligned?
The cameras were mechanically aligned (X, Y, Z and rotationally) at the centers and as far as possible at the edges. Some 3D is produced where the cameras are not well aligned, but [unless otherwise stated in the user manual concerned] in general the 3D test sets do not include any examples of basic alignment errors, firstly, as this problem is rapidly becoming much less common, and secondly if the cameras were not well aligned during filming, this is very easily corrected in post production
Were the scenes filmed with or without convergence?
All filming was done with the cameras parallel: no convergence was used in any of the filming. This was done to avoid differential trapezoidal views Left to Right, in order to avoid the subsequent post-production corrections that would otherwise be required.
Correct alignment and use of 'identical' cameras and lenses in general resulted in good geometric matching between Left and Right. However, each of the lenses exhibited minor inconsistencies between Left and Right at each zoom level (as is normally the case and likely to continue for some time); on some of the video sequences this may be observable and where this is the case this is indicated in section 3EV.07 for the sequence concerned.
What interocular spacing was used to film the sequences?
For each sequence the interocular spacing of the cameras is stated in the user manual. In some cases this was relatively small compared to the subjects/field of view, leading to a slight 3D effect; in many cases this was larger, leading to a very distinct 3D effect. In most cases the interocular was maintained at a distance so that the 'average' negative and positive disparity was within the limits considered reasonable by Sky [see below] as this produces acceptable 3D given the anticipated screen size and viewer distance.
What negative and positive disparity does each scene have?
Negative disparity is the Left/Right difference that makes objects appear closer to the viewer than the screen plane, i.e. out of the screen. It is given as a negative number below.
Positive disparity is the Left/Right difference that makes objects appear to the viewer to be farther away than the screen plane, i.e. into the screen. It is given as a positive number below.
For each clip a figure is given in the user manual for:
- The average/typical negative and positive disparity, (respectively sections 3DN.01 and 3DN.02 for each sequence) and
- The peak (transitory) negative and positive disparity (respectively sections 3DN.04 and 3DN.05 for each sequence)
...as a percentage of the screen width.
For some clips with significant movement it is necessary to make a judgement about average/typical values: this will usually be the most obvious elements of the foreground and background.
Many clips have very short-term large disparities (particularly negative): in many cases although the disparity is 'excessive' it is likely to be tolerated by a viewer, due to its short-term nature and context.
In any event as this is intended to be a test set for 3D, the 'rules' of acceptability are sometimes deliberately broken to allow the user to explore these limits and applicability of these rules in a user's context.
What location was set for the screen plane?
In the many cases the screen plane has been set in post-production at the main subject; however sometimes this is not the case in order to give the desired effect
In most cases the screen plane does not move; however some sequences have the screen plane changing during the sequence. When this is done, the change is generally gradual and either for aesthetic reasons or in order to reduce excessive negative disparity, and is indicated by a change in the disparity percentages.
As the Left and Right sequences are provided separately, most stereo viewers allow the user to adjust the screen plane (by moving the sequences left/right), so these can be adjusted to experiment with different locations of the screen plane.
What screen size and viewer distance were the sets filmed for?
The user manual for each test set states this; in general as these are test sets of video sequences it has been assumed that they will be more often viewed in a test environment, i.e. where:
- A typical large screen TV is used for viewing (approximately in the range 36"/1.0m to 60"/1.5m) at a distance of approximately 3m
- And/or a computer monitor, 22" (0.6m) or above in size is used for viewing at a distance of approximately 1m
Consequently most sequences have been filmed with the appropriate subject choice, interocular spacing and lens choice to suit this. However, there are a number of sequences where the disparities are relatively low or relatively high, making these sequences more suitable for viewing respectively on larger screens (e.g. cinema-size) at greater distance or smaller screens (e.g. mobile devices) at closer distances.
Have floating windows been applied?
In general, no floating windows have been applied to the sequences (although the user manual may state differently, as noted in 3DN.10 for each sequence), so some sequences have obvious/discomfiting window violations (i.e. where an object is visible in one eye but is completely or partially off-screen for the other eye, making 3D resolution impossible for the viewer). Where this is particularly the case this is stated in the 3D notes (section GN.08) for that particular sequence.
The user is of course free to apply floating windows if desired.
Have color corrections between left and right been applied?
Most sequences have been color corrected; for the majority of sequences the correction required has been limited; generally only due to a slight color cast caused by the optics of the filming rig.
However, despite identical camera and storage settings between Left and Right, in some cases there is a distinct color cast difference between the Left and Right cameras. The reasons for the color cast differences were:
- Specular and diffuse reflection differences within the scene between Left and Right. As the angle is slightly different between Left and Right, some objects can produce substantially different reflections (the most obvious example is a partially shiny surface, which from one angle gives a much stronger reflection of sunlight, but from a slightly different angle simply shows its surface color);
- Light differences between Left and Right causing different camera responses. The same lenses and cameras were used Left and Right (with serial numbers very close together); however, despite this the different light entering each side would sometimes cause significantly different responses, giving a large color cast between the Left and Right (sometimes varying within the time of a sequence)
- Stray light/highlights/lens flare. Despite use of matte boxes, there were occasions when stray light impinged on the lens for one side and not the other, causing internal lens reflections or color shifts, or significantly different responses
For these circumstances it has been partially color corrected or not been color corrected at all. The purpose with these sequences is to allow the user to explore the effects (encoding and visual) under these circumstances. However, for the sequences where color correction has been partially done/not done, it has been checked that the color differences do not detract from the 3D aspects of the clips concerned.
Whether a sequence has been color corrected or not is stated in section 3DN.08 for each sequence.
Were the cameras synchronized? ('genlocked')
Yes, for the vast majority of the sequences (although deliberately not for a few sequences: see below).
One of the challenges of 3D filming is to ensure that the camera shutters are synchronized, i.e. the cameras are 'genlocked' together.
The term 'shutter' refers to film cameras and does not really apply to digital cameras where there is no mechanical shutter (such timing is done electronically), but the term is still used and can be applied as the effect is very similar.
If the cameras are not synchronized (genlocked), an object which is moving is recorded at one place in one camera can appear at a different place in the other camera. For example, if the cameras are not genlocked, an object falling vertically may appear near the top of the frame in the Left Camera - as this is the time when the Left camera shutter was 'open' - but appear more towards the middle of the frame in the Right camera. Clearly this will give some difference between Left and Right, which will therefore appear as a 3D effect - but it is not. In many cases this 'false' 3D effect is not noticeable; in some cases it is.
Some 3D is still being made where the cameras are not genlocked e.g. because lower cost cameras were used which do not have this capability, or there was an error during production. Therefore a few of each of the 1080p and 2K sequences have the cameras not genlocked, however, care has been taken to ensure that the synchronization difference in these cases is relatively small and there is no overall effect from the lack of genlock, so that the sequences concerned are still entirely usable. (In most cases the timing difference is small and it is hard to tell that the sequences are not genlocked, even with examination of the sequences frame-by-frame.) Essentially, the lack of genlock is a very minor factor and only perceivable on very small movement differences on some of the small scene elements of the sequences concerned.
Where a sequence has the cameras not genlocked this is indicated in the user manual in section 3EV.07 for the sequence concerned, as 'Not genlocked'.
What post-production was done on the sequences?
Post-production has been limited to only that required: generally only that needed to set the 3D disparity. All post-production was done either floating point or minimum at 16-bits per component, 4:4:4, and each operation done on the video was checked to ensure that the original could be reproduced with zero change of data at 12-bits resolution (by applying the operation forwards then in reverse and checking that there was no difference with the original camera data input).
Do the sequences meet the Sky Television recommendations for 3D content?
BSkyB Television in the UK has recently launched a 3D TV channel. For content providers wishing to submit content, BSkyB has produced a specification of requirements.
Note: the information which is provided below has been paraphrased from the BSkyB document and inclusion of comments in the document below is for reference and convenience only; the original document from BSkyB should be referred to.
- Negative disparity should not exceed 1% for majority of the time
- Positive disparity should not exceed 2% for majority of the time
- Peak (transitory) negative disparity should not exceed 2.5%
- Peak (transitory) positive disparity should not exceed 4%
These values are given for a screen size of 46" to 70" diagonal (1.2m to 1.8m); recommended viewer distance is not stated.
Where the description of each sequence states if the sequence is 'Within the Sky spec' in sections 3DN.03 and 3DN.06 for each sequence it is the above limits which are referenced.
What about the T3vid logo?
The T3vid logo has deliberately not been made 3D, in order to have as little impact as possible on encoders which do differential encoding (i.e. encode the difference in the Right from the Left).
It is also aligned on a 16-bit macroblock boundary, is static throughout the sequence and is of a dark color, designed to be unobtrusive: when viewing the video, in practice it can easily be ignored (although it is generally not at the apparent depth of the nearby video or at screen plane depth).