IVTC trivial fixes 2

Signed-off-by: Jean-Baptiste Kempf <jb@videolan.org>

IVTC trivial fixes 2
Signed-off-by: Jean-Baptiste Kempf <jb@videolan.org>
b48b2a5c · Juha Jeronen · Jean-Baptiste Kempf · f7c77e92 · b48b2a5c
Commit b48b2a5c authored Apr 19, 2011 by Juha Jeronen Committed by Jean-Baptiste Kempf Apr 19, 2011
Show whitespace changes
Inline Side-by-side

Showing with 566 additions and 567 deletions

modules/video_filter/deinterlace.c modules/video_filter/deinterlace.c +566 -567

No files found.
--- a/modules/video_filter/deinterlace.c
+++ b/modules/video_filter/deinterlace.c
@@ -2454,14 +2454,13 @@ static int RenderPhosphor( filter_t *p_filter,
 * Transcode, some from TVTime, and some original.
 *
 * If the input material is pure NTSC telecined film, inverse telecine
- *   (also known as "film mode") will (ideally) exactly recover the original
+ * will (ideally) exactly recover the original progressive film frames.
- *   (progressive film frames. The output will run at 4/5 of the original
+ * The output will run at 4/5 of the original framerate with no loss of
- *   (framerate with no loss of information. Interlacing artifacts are removed,
+ * information. Interlacing artifacts are removed, and motion becomes
- *   and motion becomes as smooth as it was on the original film.
+ * as smooth as it was on the original film. For soft-telecined material,
- *   For soft-telecined material, on the other hand, the progressive frames
+ * on the other hand, the progressive frames alredy exist, so only the
- *   alredy exist, so only the timings are changed such that the output
+ * timings are changed such that the output becomes smooth 24fps (or would,
- *   becomes smooth 24fps (or would, if the output device had an infinite
+ * if the output device had an infinite framerate).
- *   framerate).
 *
 * Put in simple terms, this filter is targeted for NTSC movies and
 * especially anime. Virtually all 1990s and early 2000s anime is
@@ -2507,8 +2506,8 @@ static int RenderPhosphor( filter_t *p_filter,
 * Finally, note also that IVTC is the only correct way to deinterlace NTSC
 * telecined material. Simply applying an interpolating deinterlacing filter
 * (with no framerate doubling) is harmful for two reasons. First, even if
- *   (the filter does not damage already progressive frames, it will lose half
+ * the filter does not damage already progressive frames, it will lose half
- *   (of the available vertical resolution of those frames that are judged
+ * of the available vertical resolution of those frames that are judged
 * interlaced. Some algorithms combining data from multiple frames may be
 * able to counter this to an extent, effectively performing something akin
 * to the frame reconstruction part of IVTC. A more serious problem is that
@@ -2584,7 +2583,7 @@ static int RenderPhosphor( filter_t *p_filter,
 * field renderer displays the material (one field at a time, dominant
 * field first).
 *
- *   Note that the VFD may, *correctly*, flip mid-stream, if soft field repeats
+ * The VFD may, *correctly*, flip mid-stream, if soft field repeats
 * (repeat_pict) have been used. They are commonly used in soft telecine
 * (see below), but also occasional lone field repeats exist in some streams,
 * e.g., Sol Bianca.
@@ -2597,7 +2596,7 @@ static int RenderPhosphor( filter_t *p_filter,
 * The reason for the words "classical telecine" above, when field
 * duplication was first mentioned, is that there exists a
 * "full field blended" version, where the added fields are not exact
- *   "duplicates, but are blends of the original film frames. This is rare
+ * duplicates, but are blends of the original film frames. This is rare
 * in NTSC, but some material like this reportedly exists. See
 * http://www.animemusicvideos.org/guides/avtech/videogetb2a.html
 * In these cases, the additional fields are a (probably 50%) blend of the
@@ -2638,7 +2637,7 @@ static int RenderPhosphor( filter_t *p_filter,
 * Finally, note that telecined video is often edited directly in interlaced
 * form, disregarding safe cut positions as pertains to the telecine sequence
 * (there are only two: between "d" and "e", or between "e" and the
- *   (next "a"). Thus, the telecine sequence will in practice jump erratically
+ * next "a"). Thus, the telecine sequence will in practice jump erratically
 * at cuts [**]. An aggressive detection strategy is needed to cope with
 * this.
 *
@@ -2651,8 +2650,8 @@ static int RenderPhosphor( filter_t *p_filter,
 * if the interlaced picture is viewed as-is, the luma alternates every line,
 * while the chroma alternates only every two lines of the picture.
 *
- *   That is, an interlaced frame from a 4:2:0 telecine looks like this
+ * That is, an interlaced frame in a 4:2:0 telecine looks like this
- *   (numbers indicate which frame the data comes from):
+ * (numbers indicate which film frame the data comes from):
 *
 * luma  stored 4:2:0 chroma  displayed chroma
 * 1111  1111                 1111
@@ -2661,10 +2660,9 @@ static int RenderPhosphor( filter_t *p_filter,
 * 2222                       2222
 * ...   ...                  ...
 *
- *   The deinterlace filter sees the stored 4:2:0 chroma.
+ * The deinterlace filter sees the stored 4:2:0 chroma. The "displayed chroma"
- *   The "displayed chroma" is only generated later in the filter chain
+ * is only generated later in the filter chain (probably when YUV is converted
- *   (probably when YUV is converted to the display format, if the display
+ * to the display format, if the display does not accept YUV 4:2:0 directly).
- *   does not accept YUV 4:2:0 directly).
 *
 *
 * Next, how NTSC soft telecine works:
@@ -2721,7 +2719,7 @@ static int RenderPhosphor( filter_t *p_filter,
 *
 * Finally, note also that a stream may also request a lone field repeat
 * (a sudden "3" surrounded by "2"s). Fortunately, these can be handled as
- *   (a two-frame soft telecine, as they match the first and third
+ * a two-frame soft telecine, as they match the first and third
 * flag patterns above.
 *
 * Combinations with several "3"s in a row are not valid for soft or hard
@@ -2783,16 +2781,15 @@ static int RenderPhosphor( filter_t *p_filter,
 * From these cadence tables we can extract two strategies for
 * cadence detection. We use both.
 *
- *   Strategy 1: duplicated fields.
+ * Strategy 1: duplicated fields ("vektor").
 *
 * Consider that each stencil position has a unique duplicate field
 * condition. In one unique position, "dea", there is no match; in all
 * other positions, exactly one. By conservatively filtering the
 * possibilities based on detected hard field repeats (identical fields
 * in successive input frames), it is possible to gradually lock on
- *   to the cadence. This kind of strategy is used by Vektor's classic
+ * to the cadence. This kind of strategy is used by the classic IVTC filter
- *   IVTC filter from TVTime (although there are some implementation
+ * in TVTime/Xine by Billy Biggs (Vektor), hence the name.
- *   differences when compared to ours).
 *
 * "Conservative" here means that we do not rule anything out, but start at
 * each stencil position by suggesting the position "dea", and then only add
@@ -2807,7 +2804,7 @@ static int RenderPhosphor( filter_t *p_filter,
 * duplicate field detection against the input. It is very good at staying
 * locked on once it acquires the cadence, and it does so correctly very
 * often. These are indeed characteristics that can be observed in the
- *   behaviour of Vektor's classic filter.
+ * behaviour of the TVTime/Xine filter.
 *
 * Note especially that 8fps/12fps animation, common in anime, will cause
 * spurious hard-repeated fields. The conservative nature of the method
@@ -2835,10 +2832,10 @@ static int RenderPhosphor( filter_t *p_filter,
 * is detected.
 *
 *
- *   Strategy 2: progressive/interlaced field combinations.
+ * Strategy 2: progressive/interlaced field combinations ("scores").
 *
 * We can also form a second strategy, which is not as reliable in practice,
- *   but which locks on faster. This is original to this filter.
+ * but which locks on faster when it does. This is original to this filter.
 *
 * Consider all possible field pairs from two successive frames: TCBC, TCBN,
 * TNBC, TNBN. After one frame, these become TPBP, TPBC, TCBP, TCBC.
@@ -2846,18 +2843,20 @@ static int RenderPhosphor( filter_t *p_filter,
 * are the exhaustive list of possible field pairs from two successive
 * frames in the three-frame PCN stencil.
 *
- *   The field pairs can be used for cadence position detection. The above
+ * The above tables list triplets of field pair combinations for each cadence
- *   tables list triplets of field pair combinations for each cadence position,
+ * position, which should produce progressive frames. All the given triplets
- *   which should produce progressive frames. All the given triplets are unique
+ * are unique in each table alone, although the one at "dea" is
- *   in each table alone, although the one at "dea" is indistinguishable from
+ * indistinguishable from the case of pure progressive material. It is also
- *   the case of pure progressive material. It is also the only one which is
+ * the only one which is not unique across both tables.
- *   not unique across both tables.
 *
 * Thus, all sequences of two neighboring triplets are unique across both
 * tables. (For "neighboring", each table is considered to wrap around from
 * "eab" back to "abc", i.e. from the last row back to the first row.)
 * Furthermore, each sequence of three neighboring triplets is redundantly
 * unique (i.e. is unique, and reduces the chance of false positives).
+ * (In practice, though, we already know which table to consider, from the fact
+ * that TFD and VFD must match. Checking only the relevant table makes the
+ * strategy slightly more robust.)
 *
 * The important idea is: *all other* field pair combinations should produce
 * frames that look interlaced. This includes those combinations present in
@@ -2866,27 +2865,26 @@ static int RenderPhosphor( filter_t *p_filter,
 * uniqueness property, *every* "wrong" row will always contain at least one
 * combination that differs from those in the "correct" row).
 *
- *   As for how we use these observations, we generate the artificial frames
+ * We generate the artificial frames TCBC, TCBN, TNBC and TNBN (virtually;
- *   TCBC, TCBN, TNBC and TNBN (virtually; no data is actually moved).
+ * no data is actually moved). Two of these are just the frames C and N,
- *   Two of these are just the frames C and N, which already exist; the two
+ * which already exist; the two others correspond to composing the given
- *   others correspond to composing the given field pairs. We then compute
+ * field pairs. We then compute the interlace score for each of these frames.
- *   the interlace score for each of these frames. The interlace scores
+ * The interlace scores of what are now TPBP, TPBC and TCBP, also needed,
- *   of what are now TPBP, TPBC and TCBP, also needed, were computed by
+ * were computed by this same mechanism during the previous input frame.
- *   this same mechanism during the previous input frame. These can be slided
+ * These can be slided in history and reused.
- *   in history and reused.
 *
 * We then check, using the computed interlace scores, and taking into
- *   account the video field dominance information (to only check valid
+ * account the video field dominance information, which field combination
- *   combinations), which field combination triplet given in the tables
+ * triplet given in the appropriate table produces the smallest sum of
- *   produces the smallest sum of interlace scores. Unless we are at
+ * interlace scores. Unless we are at PCN = "dea" (which could also be pure
- *   PCN = "dea" (which could also be pure progressive!), this immediately
+ * progressive!), this immediately gives us the most likely current cadence
- *   gives us the most likely current cadence position. Combined with a
+ * position. Combined with a two-step history, the sequence of three most
- *   two-step history, the sequence of three most likely positions found this
+ * likely positions found this way always allows us to make a more or less
- *   way always allows us to make a more or less reliable detection. (That is,
+ * reliable detection. (That is, when a reliable detection is possible; if the
- *   when a reliable detection is possible; note that if the video has no
+ * video has no motion at all, every detection will report the position "dea".
- *   motion at all, every detection will report the position "dea". In anime,
+ * In anime, still shots are common. Thus we must augment this with a
- *   still shots are common. Thus we must augment this with a full-frame motion
+ * full-frame motion detection that switches the detector off if no motion
- *   detection that switches the detector off if no motion was detected.)
+ * was detected.)
 *
 * The detection seems to need four full-frame interlace analyses per frame.
 * Actually, three are enough, because the previous N is the new C, so we can
@@ -2923,11 +2921,11 @@ static int RenderPhosphor( filter_t *p_filter,
 * reliably on a valid cadence.
 *
 * When the cadence fails (we detect this from a sudden upward jump in the
- *   interlace scores of the constructed frames), we reset the "TVTime"
+ * interlace scores of the constructed frames), we reset the "vektor"
 * detector strategy and fall back to an emergency frame composer, where we
 * use ideas from Transcode's IVTC.
 *
- *   In the emergency mode, we simply output the least interlaced frame out of
+ * In this emergency mode, we simply output the least interlaced frame out of
 * the combinations TNBN, TNBC and TCBN (where only one of the last two is
 * tested, based on the stream TFF/BFF information). In this mode, we do not 
 * touch the timestamps, and just pass all five frames from each group right
@@ -2944,7 +2942,8 @@ static int RenderPhosphor( filter_t *p_filter,
 *
 * To make five into four we need to extend frame durations by 25%.
 * Consider the following diagram (times given in 90kHz ticks, rounded to
- *   integers; this is just for illustration):
+ * integers; this is just for illustration, and for comparison with the
+ * "scratch paper" comments in pulldown.c of TVTime/Xine):
 *
 * NTSC input (29.97 fps)
 * a       b       c       d        e        a (from next group) ...
@@ -2955,7 +2954,7 @@ static int RenderPhosphor( filter_t *p_filter,
 *
 * Three of the film frames have length 3754, and one has 3753
 * (it is 1/90000 sec shorter). This rounding was chosen so that the lengths
- *   (of the group of four sum to the original 15015.
+ * of the group of four sum to the original 15015.
 *
 * From the diagram we get these deltas for presentation timestamp adjustment
 * (in 90 kHz ticks, for illustration):
@@ -2979,9 +2978,9 @@ static int RenderPhosphor( filter_t *p_filter,
 * position "d". (Alternatively, upon lock-on, we could wait until we are
 * at "a" before switching on IVTC, but this makes the maximal delay
 * [max. detection + max. wait] = 3 + 4 = 7 input frames, which comes to
- *   [7/30 ~ 0.23 seconds instead of the 3/30 = 0.10 seconds from purely
+ * 7/30 ~ 0.23 seconds instead of the 3/30 = 0.10 seconds from purely
- *   the detection. I prefer the one-time jerk, which also happens to be
+ * the detection. The one-time jerk is simpler to implement and gives the
- *   simpler to implement.)
+ * faster lock-on.)
 *
 * It is clear that "e" is a safe choice for the dropped frame. This can be
 * seen from the timings and the cadence tables. First, consider the timings.