Register or Login To Download This Patent As A PDF
United States Patent Application 
20170150156

Kind Code

A1

Zhang; Li
; et al.

May 25, 2017

ILLUMINATION COMPENSATION WITH NONSQUARE PREDICTIVE BLOCKS IN VIDEO
CODING
Abstract
A video coder uses illumination compensation (IC) to generate a
nonsquare predictive block of a current prediction unit (PU) of a
current coding unit (CU) of a current picture of the video data. In doing
so, the video coder subsamples a first set of reference samples such
that a total number of reference samples in the first subsampled set of
reference samples is equal to 2.sup.m. Additionally, the video coder
subsamples a second set of view reference samples such that a total
number of reference samples in the second subsampled set of reference
samples is equal to 2.sup.m. The video coder determines a first IC
parameter based on the first subsampled set of reference samples and the
second subsampled set of reference samples. The video coder uses the
first IC parameter to determine a sample of the nonsquare predictive
block.
Inventors: 
Zhang; Li; (San Diego, CA)
; Chen; Jianle; (San Diego, CA)
; Chien; WeiJung; (San Diego, CA)
; Zhao; Xin; (San Diego, CA)
; Li; Xiang; (San Diego, CA)
; Karczewicz; Marta; (San Diego, CA)

Applicant:  Name  City  State  Country  Type  QUALCOMM Incorporated  San Diego  CA  US
  
Family ID:

1000002308919

Appl. No.:

15/359572

Filed:

November 22, 2016 
Related U.S. Patent Documents
       
 Application Number  Filing Date  Patent Number 

 62260103  Nov 25, 2015  
 62310271  Mar 18, 2016  

Current U.S. Class: 
1/1 
Current CPC Class: 
H04N 19/587 20141101; H04N 19/176 20141101 
International Class: 
H04N 19/176 20060101 H04N019/176; H04N 19/587 20060101 H04N019/587 
Claims
1. A method of decoding video data, the method comprising: receiving, by
a video decoder, a bitstream that comprises an encoded representation of
the video data; using, by the video decoder, illumination compensation
(IC) to generate a nonsquare predictive block of a current prediction
unit (PU) of a current coding unit (CU) of a current picture of the video
data, wherein using IC to generate the nonsquare predictive block
comprises: determining, by the video decoder, based on a vector of the
current PU, a reference block in a reference picture, the reference block
and the nonsquare predictive block being the same size and shape;
subsampling, by the video decoder, a first set of reference samples to
generate a first subsampled set of reference samples with a first
subsampling ratio, a total number of reference samples in the first set
of reference samples not being equal to 2.sup.m and a total number of
reference samples in the first subsampled set of reference samples being
equal to 2.sup.m, wherein the first set of reference samples comprises
samples outside the nonsquare predictive block along a left side and a
top side of the nonsquare predictive block, and m is an integer;
subsampling, by the video decoder, a second set of reference samples to
generate a second subsampled set of reference samples with a second
subsampling ratio, a total number of reference samples in the second set
of reference samples not being equal to 2.sup.m and a total number of
reference samples in the second subsampled set of reference samples
being equal to 2.sup.m, wherein the second set of reference samples
comprises samples outside the reference block along a left side and a top
side of the reference block; determining, by the video decoder, a first
IC parameter based on the total number of reference samples in the first
subsampled set of reference samples and the total number of reference
samples in the second subsampled set of reference samples, the first
subsampled set of reference samples, and the second subsampled set of
reference samples; and determining, by the video decoder, a sample of the
nonsquare predictive block based on the first IC parameter; and
reconstructing, by the video decoder, based on the nonsquare predictive
block, a coding block of the current CU.
2. The method of claim 1, wherein the vector of the current PU is a
disparity vector of the current PU and the reference picture is an
interview reference picture.
3. The method of claim 1, wherein determining the first IC parameter
comprises determining, by the video decoder, the first IC parameter such
that the first IC parameter is based on: i = 0 2 N  1
Rec neigh ( i )  a i = 0 2 N  1 Rec refneigh
( i ) 2 N ##EQU00033## wherein 2N denotes the total number of
reference samples in the first subsampled set of reference samples and
the total number of reference samples in the second subsampled set of
reference samples, Rec.sub.neigh(i) denotes an ith reference sample in
the first subsampled set of reference samples, and Rec.sub.refneigh(i)
denotes an ith reference sample in the second subsampled set of
reference samples; and wherein the sample of the nonsquare predictive
block is at coordinates (i,j) relative to a topleft corner of the
current picture and determining the sample of the nonsquare predictive
block comprises determining, by the video decoder, the sample of the
nonsquare predictive block at such that the sample is based on:
a*r(i+dv.sub.x,j+dv.sub.y+b) where b is the first IC parameter, a is a
second IC parameter, r is the reference picture, dv.sub.x is a horizontal
component of the vector of the current PU, and dv.sub.y is a vertical
component of the vector of the current PU.
4. The method of claim 3, wherein the vector of the current PU is a
disparity vector of the current PU and the reference picture is an
interview reference picture.
5. The method of claim 1, wherein using IC to generate the nonsquare
predictive block further comprises: determining, by the video decoder,
the second IC parameter such that the second IC parameter is based on:
2 N i = 0 2 N  1 Rec neigh ( i ) Rec
refneigh ( i )  i = 0 2 N  1 Rec neigh
( i ) i = 0 2 N  1 Rec refneigh ( i )
2 N i = 0 2 N  1 Rec refneigh ( i )
Rec neigh ( i )  ( i = 0 2 N  1 Rec
refneigh ( i ) ) 2 . ##EQU00034##
6. The method of claim 1, wherein at least one of: subsampling the first
set of reference samples comprises decimating, by the video decoder, the
first set of reference samples to generate the first subsampled set of
reference samples; and subsampling the second set of reference samples
comprises decimating, by the video decoder, the second set of reference
samples to generate the second subsampled set of reference samples.
7. The method of claim 1, wherein at least one of: subsampling the first
set of reference samples comprises performing, by the video decoder,
interpolated sampling of the first set of reference samples to generate
the first subsampled set of reference samples; and subsampling the
second set of reference samples comprises performing, by the video
decoder, interpolated sampling of the second set of reference samples to
generate the second subsampled set of reference samples.
8. The method of claim 1, wherein: the method further comprises
obtaining, by the video decoder, from the bitstream, a syntax element
indicating a subsampling method, and wherein at least one of:
subsampling the first set of reference samples comprises using, by the
video decoder, the indicated subsampling method to subsample the first
set of reference samples to generate the first subsampled set of
reference samples; or subsampling the second set of reference samples
comprises using, by the video decoder, the indicated subsampling method
to subsample the second set of reference samples to generate the second
subsampled set of reference samples.
9. The method of claim 1, wherein: the method further comprises
determining, by the video decoder, based on a size of the current PU, a
subsampling method, and wherein at least one of: subsampling the first
set of reference samples comprises using, by the video decoder, the
determined subsampling method to subsample the first set of reference
samples to generate the first subsampled set of reference samples; or
subsampling the second set of reference samples comprises using, by the
video decoder, the determined subsampling method to subsample the
second set of reference samples to generate the second subsampled set of
reference samples.
10. A method of encoding video data, the method comprising: receiving, by
a video encoder, the video data; using, by the video encoder,
illumination compensation (IC) to generate a nonsquare predictive block
of a current prediction unit (PU) of a current coding unit (CU) of a
current picture of the video data, wherein using IC to generate the
nonsquare predictive block comprises: determining, by the video encoder,
based on a vector of the current PU, a reference block in a reference
picture, the reference block and the nonsquare predictive block being
the same size and shape; subsampling, by the video encoder, a first set
of reference samples to generate a first subsampled set of reference
samples with a first subsampling ratio, a total number of reference
samples in the first set of reference samples not being equal to 2.sup.m
and a total number of reference samples in the first subsampled set of
reference samples being equal to 2.sup.m, wherein the first set of
reference samples comprises samples outside the nonsquare predictive
block along a left side and a top side of the nonsquare predictive
block, and m is an integer; subsampling, by the video encoder, a second
set of reference samples to generate a second subsampled set of
reference samples with a second subsampling ratio, a total number of
reference samples in the second set of reference samples not being equal
to 2.sup.m and a total number of reference samples in the second
subsampled set of reference samples being equal to 2.sup.m, wherein the
second set of reference samples comprises samples outside the reference
block along a left side and a top side of the reference block;
determining, by the video encoder, a first IC parameter based on the
total number of reference samples in the first subsampled set of
reference samples and the total number of reference samples in the second
subsampled set of reference samples, the first subsampled set of
reference samples, and the second subsampled set of reference samples;
and generating, by the video encoder, residual data based on the
nonsquare predictive block; and outputting, by the video encoder, a
bitstream that includes data based on the residual data.
11. The method of claim 10, wherein the vector of the current PU is a
disparity vector of the current PU and the reference picture is an
interview reference picture.
12. The method of claim 10, wherein determining the first IC parameter
comprises determining, by the video decoder, the first IC parameter such
that the first IC parameter is based on: i = 0 2 N  1
Rec neigh ( i )  a i = 0 2 N  1 Rec refneigh
( i ) 2 N ##EQU00035## wherein 2N denotes the total number of
reference samples in the first subsampled set of reference samples and
the total number of reference samples in the second subsampled set of
reference samples, Rec.sub.neigh(i) denotes an ith reference sample in
the first subsampled set of reference samples, and Rec.sub.refneigh(i)
denotes an ith reference sample in the second subsampled set of
reference samples; and wherein the sample of the nonsquare predictive
block is at coordinates (i,j) relative to a topleft corner of the
current picture and determining the sample of the nonsquare predictive
block comprises determining, by the video decoder, the sample of the
nonsquare predictive block at such that the sample is based on:
a*r(i+dv.sub.x,j+dv.sub.y+b) where b is the first IC parameter, a is a
second IC parameter, r is the reference picture, dv.sub.x is a horizontal
component of the vector of the current PU, and dv.sub.y is a vertical
component of the vector of the current PU.
13. The method of claim 12, wherein the vector of the current PU is a
disparity vector of the current PU and the reference picture is an
interview reference picture.
14. The method of claim 10, wherein using IC to generate the nonsquare
predictive block further comprises: determining, by the video encoder,
the second IC parameter such that the second IC parameter is based on:
2 N i = 0 2 N  1 Rec neigh ( i ) Rec
refneigh ( i )  i = 0 2 N  1 Rec neigh
( i ) i = 0 2 N  1 Rec refneigh ( i )
2 N i = 0 2 N  1 Rec refneigh ( i )
Rec neigh ( i )  ( i = 0 2 N  1 Rec
refneigh ( i ) ) 2 . ##EQU00036##
15. The method of claim 10, wherein at least one of: subsampling the set
of reference samples comprises decimating, by the video encoder, the
first set of reference samples to generate the first subsampled set of
reference samples; and subsampling the second set of reference samples
comprises decimating, by the video encoder, the second set of reference
samples to generate the second subsampled set of reference samples.
16. The method of claim 10, wherein at least one of: subsampling the
first set of reference samples comprises performing, by the video
encoder, interpolated sampling of the first set of reference samples to
generate the first subsampled set of reference samples; and subsampling
the second set of reference samples comprises performing, by the video
encoder, interpolated sampling of the second set of reference samples to
generate the second subsampled set of reference samples.
17. The method of claim 10, wherein: the method further comprises
including, by the video encoder, in the bitstream, a syntax element
indicating a subsampling method, and wherein at least one of:
subsampling the first set of reference samples comprises using, by the
video encoder, the indicated subsampling method to subsample the first
set of reference samples to generate the first subsampled set of
reference samples; or subsampling the second set of reference samples
comprises using, by the video encoder, the indicated subsampling method
to subsample the second set of reference samples to generate the second
subsampled set of reference samples.
18. The method of claim 10, wherein: the method further comprises
determining, by the video encoder, based on a size of the current PU, a
subsampling method, and wherein at least one of: subsampling the first
set of reference samples comprises using, by the video encoder, the
determined subsampling method to subsample the first set of reference
samples to generate the first subsampled set of reference samples; and
subsampling the second set of reference samples comprises using, by the
video encoder, the determined subsampling method to subsample the
second set of reference samples to generate the second subsampled set of
reference samples.
19. An apparatus for decoding video data, the apparatus comprising: one
or more storage media configured to store the video data; and a video
decoder configured to: receive a bitstream that comprises an encoded
representation of the video data; use illumination compensation (IC) to
generate a nonsquare predictive block of a current prediction unit (PU)
of a current coding unit (CU) of a current picture of the video data,
wherein the video decoder is configured such that, as part of using IC to
generate the nonsquare predictive block, the video decoder: determines,
based on a vector of the current PU, a reference block in a reference
picture, the reference block and the nonsquare predictive block being
the same size and shape; subsamples a first set of reference samples to
generate a first subsampled set of reference samples with a first
subsampling ratio, a total number of reference samples in the first set
of reference samples not being equal to 2.sup.m and a total number of
reference samples in the first subsampled set of reference samples being
equal to 2.sup.m, wherein the first set of reference samples comprises
samples outside the nonsquare predictive block along a left side and a
top side of the nonsquare predictive block, and m is an integer;
subsamples a second set of reference samples to generate a second
subsampled set of reference samples with a second subsampling ratio, a
total number of reference samples in the second set of reference samples
not being equal to 2.sup.m and a total number of reference samples in the
second subsampled set of reference samples being equal to 2.sup.m,
wherein the second set of reference samples comprises samples outside the
reference block along a left side and a top side of the reference block;
determines a first IC parameter based on the total number of reference
samples in the first subsampled set of reference samples and the total
number of reference samples in the second subsampled set of reference
samples, the first subsampled set of reference samples, and the second
subsampled set of reference samples; and determines a sample of the
nonsquare predictive block based on the first IC parameter; and
reconstruct, based on the nonsquare predictive block, a coding block of
the current CU.
20. The apparatus of claim 19, wherein the vector of the current PU is a
disparity vector of the current PU and the reference picture is an
interview reference picture.
21. The apparatus of claim 19, wherein the video decoder is configured to
determine the first IC parameter such that the first IC parameter is
based on: i = 0 2 N  1 Rec neigh ( i )  a
i = 0 2 N  1 Rec refneigh ( i ) 2 N
##EQU00037## wherein 2N denotes the total number of reference samples in
the first subsampled set of reference samples and the total number of
reference samples in the second subsampled set of reference samples,
Rec.sub.neigh(i) denotes an ith reference sample in the first
subsampled set of reference samples, and Rec.sub.refneigh(i) denotes an
ith reference sample in the second subsampled set of reference samples;
and wherein the sample of the nonsquare predictive block is at
coordinates (i,j) relative to a topleft corner of the current picture
and the video decoder is configured to determine the sample of the
nonsquare predictive block at such that the sample is based on:
a*r(i+dv.sub.x,j+dv.sub.y+b) where b is the first IC parameter, a is a
second IC parameter, r is the reference picture, dv.sub.x is a horizontal
component of the vector of the current PU, and dv.sub.y is a vertical
component of the vector of the current PU.
22. The apparatus of claim 21, wherein the vector of the current PU is a
disparity vector of the current PU and the reference picture is an
interview reference picture.
23. The apparatus of claim 19, wherein the video decoder is configured
such that, as part of using IC to generate the nonsquare predictive
block, the video decoder: determines the second IC parameter such that
the second IC parameter is based on: 2 N i = 0 2 N 
1 Rec neigh ( i ) Rec refneigh ( i ) 
i = 0 2 N  1 Rec neigh ( i ) i = 0 2 N  1
Rec refneigh ( i ) 2 N i = 0 2 N  1
Rec refneigh ( i ) Rec neigh ( i )  (
i = 0 2 N  1 Rec refneigh ( i ) ) 2 .
##EQU00038##
24. The apparatus of claim 19, wherein at least one of: the video decoder
is configured such that, as part of subsampling the first set of
reference samples, the video decoder decimates the first set of reference
samples to generate the first subsampled set of reference samples; and
the video decoder is configured such that, as part of subsampling the
second set of reference samples, the video decoder decimates the second
set of reference samples to generate the second subsampled set of
reference samples.
25. The apparatus of claim 19, wherein at least one of: the video decoder
is configured such that, as part of subsampling the first set of
reference samples, the video decoder performs interpolated sampling of
the first set of reference samples to generate the first subsampled set
of reference samples; and the video decoder is configured such that, as
part of subsampling the second set of reference samples, the video
decoder performs interpolated sampling of the second set of reference
samples to generate the second subsampled set of reference samples.
26. The apparatus of claim 19, wherein: the video decoder is further
configured to obtain, from the bitstream, a syntax element indicating a
subsampling method, and wherein at least one of: the video decoder is
configured such that, as part of subsampling the first set of reference
samples, the video decoder uses the indicated subsampling method to
subsample the first set of reference samples to generate the first
subsampled set of reference samples; or the video decoder is configured
such that, as part of subsampling the second set of reference samples,
the video decoder uses the indicated subsampling method to subsample
the second set of reference samples to generate the second subsampled
set of reference samples.
27. The apparatus of claim 19, wherein: the video decoder is further
configured to determine, based on a size of the current PU, a
subsampling method, and wherein at least one of: the video decoder is
configured such that, as part of subsampling the first set of reference
samples, the video decoder uses the determined subsampling method to
subsample the first set of reference samples to generate the first
subsampled set of reference samples; or the video decoder is configured
such that, as part of subsampling the second set of reference samples,
the video decoder uses the determined subsampling method to subsample
the second set of reference samples to generate the second subsampled
set of reference samples.
28. An apparatus for encoding video data, the apparatus comprising: one
or more storage media configured to store video data; and a video encoder
configured to: use illumination compensation (IC) to generate a
nonsquare predictive block of a current prediction unit (PU) of a
current coding unit (CU) of a current picture of the video data, wherein
the video encoder is configured such that, as part of using IC to
generate the nonsquare predictive block, the video encoder: determines,
based on a vector of the current PU, a reference block in a reference
picture, the reference block and the nonsquare predictive block being
the same size and shape; subsamples a first set of reference samples to
generate a first subsampled set of reference samples with a first
subsampling ratio, a total number of reference samples in the first set
of reference samples not being equal to 2.sup.m and a total number of
reference samples in the first subsampled set of reference samples being
equal to 2.sup.m, wherein the first set of reference samples comprises
samples outside the nonsquare predictive block along a left side and a
top side of the nonsquare predictive block, and m is an integer;
subsamples a second set of reference samples to generate a second
subsampled set of reference samples with a second subsampling ratio, a
total number of reference samples in the second set of reference samples
not being equal to 2.sup.m and a total number of reference samples in the
second subsampled set of reference samples being equal to 2.sup.m,
wherein the second set of reference samples comprises samples outside the
reference block along a left side and a top side of the reference block;
determines a first IC parameter based on the total number of reference
samples in the first subsampled set of reference samples and the total
number of reference samples in the second subsampled set of reference
samples, the first subsampled set of reference samples, and the second
subsampled set of reference samples; and determines a sample of the
nonsquare predictive block based on the first IC parameter; and generate
residual data based on the nonsquare predictive block; and output a
bitstream that includes data based on the residual data.
29. The apparatus of claim 28, wherein the vector of the current PU is a
disparity vector of the current PU and the reference picture is an
interview reference picture.
30. The apparatus of claim 28, wherein the video encoder is configured to
determine the first IC parameter such that the first IC parameter is
based on: i = 0 2 N  1 Rec neigh ( i )  a
i = 0 2 N  1 Rec refneigh ( i ) 2 N
##EQU00039## wherein 2N denotes the total number of reference samples in
the first subsampled set of reference samples and the total number of
reference samples in the second subsampled set of reference samples,
Rec.sub.neigh(i) denotes an ith reference sample in the first
subsampled set of reference samples, and Rec.sub.refneigh(i) denotes an
ith reference sample in the second subsampled set of reference samples;
and wherein the sample of the nonsquare predictive block is at
coordinates (i,j) relative to a topleft corner of the current picture
and the video encoder is configured to determine the sample of the
nonsquare predictive block at such that the sample is based on:
a*r(i+dv.sub.x,j+dv.sub.y+b) where b is the first IC parameter, a is a
second IC parameter, r is the reference picture, dv.sub.x is a horizontal
component of the vector of the current PU, and dv.sub.y is a vertical
component of the vector of the current PU.
31. The apparatus of claim 30, wherein the vector of the current PU is a
disparity vector of the current PU and the reference picture is an
interview reference picture.
32. The apparatus of claim 28, wherein the video encoder is configured
such that, as part of using IC to generate the nonsquare predictive
block, the video encoder: determines the second IC parameter such that
the second IC parameter is based on: 2 N i = 0 2 N 
1 Rec neigh ( i ) Rec refneigh ( i ) 
i = 0 2 N  1 Rec neigh ( i ) i = 0 2 N  1
Rec refneigh ( i ) 2 N i = 0 2 N  1
Rec refneigh ( i ) Rec neigh ( i )  (
i = 0 2 N  1 Rec refneigh ( i ) ) 2 .
##EQU00040##
33. The apparatus of claim 28, wherein at least one of: the video encoder
is configured such that, as part of subsampling the first set of
reference samples, the video encoder decimates the first set of reference
samples to generate the first subsampled set of reference samples; and
the video encoder is configured such that, as part of subsampling the
second set of reference samples, the video encoder decimates the second
set of reference samples to generate the second subsampled set of
reference samples.
34. The apparatus of claim 28, wherein at least one of: the video encoder
is configured such that, as part of subsampling the first set of
reference samples, the video encoder performs interpolated sampling of
the first set of reference samples to generate the first subsampled set
of reference samples; and the video encoder is configured such that, as
part of subsampling the second set of reference samples, the video
encoder performs interpolated sampling of the second set of reference
samples to generate the second subsampled set of reference samples.
35. The apparatus of claim 28, wherein: the video encoder is further
configured to obtain, from the bitstream, a syntax element indicating a
subsampling method, and wherein at least one of: the video encoder is
configured such that, as part of subsampling the first set of reference
samples, the video encoder uses the indicated subsampling method to
subsample the first set of reference samples to generate the first
subsampled set of reference samples; or the video encoder is configured
such that, as part of subsampling the second set of reference samples,
the video encoder uses the indicated subsampling method to subsample
the second set of reference samples to generate the second subsampled
set of reference samples.
36. The apparatus of claim 28, wherein: the video encoder is further
configured to determine, based on a size of the current PU, a
subsampling method, and wherein at least one of: the video encoder is
configured such that, as part of subsampling the first set of reference
samples, the video encoder uses the determined subsampling method to
subsample the first set of reference samples to generate the first
subsampled set of reference samples; or the video encoder is configured
such that, as part of subsampling the second set of reference samples,
the video encoder uses the determined subsampling method to subsample
the second set of reference samples to generate the second subsampled
set of reference samples.
37. An apparatus for decoding video data, the apparatus comprising: means
for receiving a bitstream that comprises an encoded representation of the
video data; means for using illumination compensation (IC) to generate a
nonsquare predictive block of a current prediction unit (PU) of a
current coding unit (CU) of a current picture of the video data, wherein
the means for using IC to generate the nonsquare predictive block
comprises: means for determining, based on a vector of the current PU, a
reference block in a reference picture, the reference block and the
nonsquare predictive block being the same size and shape; means for
subsampling a first set of reference samples to generate a first
subsampled set of reference samples with a first subsampling ratio, a
total number of reference samples in the first set of reference samples
not being equal to 2.sup.m and a total number of reference samples in the
first subsampled set of reference samples being equal to 2.sup.m,
wherein the first set of reference samples comprises samples outside the
nonsquare predictive block along a left side and a top side of the
nonsquare predictive block, and m is an integer; means for subsampling
a second set of reference samples to generate a second subsampled set of
reference samples with a second subsampling ratio, a total number of
reference samples in the second set of reference samples not being equal
to 2.sup.m and a total number of reference samples in the second
subsampled set of reference samples being equal to 2.sup.m, wherein the
second set of reference samples comprises samples outside the reference
block along a left side and a top side of the reference block; means for
determining a first IC parameter based on the total number of reference
samples in the first subsampled set of reference samples and the total
number of reference samples in the second subsampled set of reference
samples, the first subsampled set of reference samples, and the second
subsampled set of reference samples; and means for determining a sample
of the nonsquare predictive block based on the first IC parameter; and
means for reconstructing, by the video decoder, based on the nonsquare
predictive block, a coding block of the current CU.
38. An apparatus for encoding video data, the apparatus comprising: means
for receiving the video data; means for using illumination compensation
(IC) to generate a nonsquare predictive block of a current prediction
unit (PU) of a current coding unit (CU) of a current picture of the video
data, wherein the means for using IC to generate the nonsquare
predictive block comprises: means for determining, based on a vector of
the current PU, a reference block in a reference picture, the reference
block and the nonsquare predictive block being the same size and shape;
means for subsampling a first set of reference samples to generate a
first subsampled set of reference samples with a first subsampling
ratio, a total number of reference samples in the first set of reference
samples not being equal to 2.sup.m and a total number of reference
samples in the first subsampled set of reference samples being equal to
2.sup.m, wherein the first set of reference samples comprises samples
outside the nonsquare predictive block along a left side and a top side
of the nonsquare predictive block, and m is an integer; means for
subsampling a second set of reference samples to generate a second
subsampled set of reference samples with a second subsampling ratio, a
total number of reference samples in the second set of reference samples
not being equal to 2.sup.m and a total number of reference samples in the
second subsampled set of reference samples being equal to 2.sup.m,
wherein the second set of reference samples comprises samples outside the
reference block along a left side and a top side of the reference block;
means for determining a first IC parameter based on the total number of
reference samples in the first subsampled set of reference samples and
the total number of reference samples in the second subsampled set of
reference samples, the first subsampled set of reference samples, and
the second subsampled set of reference samples; and means for
determining a sample of the nonsquare predictive block based on the
first IC parameter; and means for generating residual data based on the
nonsquare predictive block; and means for outputting a bitstream that
includes data based on the residual data.
39. A computerreadable medium having instructions stored thereon that,
when executed, configure an apparatus for decoding video data to: receive
a bitstream that comprises an encoded representation of the video data;
use illumination compensation (IC) to generate a nonsquare predictive
block of a current prediction unit (PU) of a current coding unit (CU) of
a current picture of the video data, wherein the instructions configure
the apparatus such that, as part of using IC to generate the nonsquare
predictive block, the apparatus: determines, based on a vector of the
current PU, a reference block in a reference picture, the reference block
and the nonsquare predictive block being the same size and shape;
subsamples a first set of reference samples to generate a first
subsampled set of reference samples with a first subsampling ratio, a
total number of reference samples in the first set of reference samples
not being equal to 2.sup.m and a total number of reference samples in the
first subsampled set of reference samples being equal to 2.sup.m,
wherein the first set of reference samples comprises samples outside the
nonsquare predictive block along a left side and a top side of the
nonsquare predictive block, and m is an integer; subsamples a second
set of reference samples to generate a second subsampled set of
reference samples with a second subsampling ratio, a total number of
reference samples in the second set of reference samples not being equal
to 2.sup.m and a total number of reference samples in the second
subsampled set of reference samples being equal to 2.sup.m, wherein the
second set of reference samples comprises samples outside the reference
block along a left side and a top side of the reference block; determines
a first IC parameter based on the total number of reference samples in
the first subsampled set of reference samples and the total number of
reference samples in the second subsampled set of reference samples, the
first subsampled set of reference samples, and the second subsampled
set of reference samples; and determines a sample of the nonsquare
predictive block based on the first IC parameter; and reconstruct, based
on the nonsquare predictive block, a coding block of the current CU.
40. A computerreadable medium having instructions stored thereon that,
when executed, configure an apparatus for encoding video data to: receive
the video data; use illumination compensation (IC) to generate a
nonsquare predictive block of a current prediction unit (PU) of a
current coding unit (CU) of a current picture of the video data, wherein
the instructions, when executed, configure the apparatus such that, as
part of using IC to generate the nonsquare predictive block, the
apparatus: determines, based on a vector of the current PU, a reference
block in a reference picture, the reference block and the nonsquare
predictive block being the same size and shape; subsamples a first set
of reference samples to generate a first subsampled set of reference
samples with a first subsampling ratio, a total number of reference
samples in the first set of reference samples not being equal to 2.sup.m
and a total number of reference samples in the first subsampled set of
reference samples being equal to 2.sup.m, wherein the first set of
reference samples comprises samples outside the nonsquare predictive
block along a left side and a top side of the nonsquare predictive
block, and m is an integer; subsamples a second set of reference samples
to generate a second subsampled set of reference samples with a second
subsampling ratio, a total number of reference samples in the second set
of reference samples not being equal to 2.sup.m and a total number of
reference samples in the second subsampled set of reference samples
being equal to 2.sup.m, wherein the second set of reference samples
comprises samples outside the reference block along a left side and a top
side of the reference block; determines a first IC parameter based on the
total number of reference samples in the first subsampled set of
reference samples and the total number of reference samples in the second
subsampled set of reference samples, the first subsampled set of
reference samples, and the second subsampled set of reference samples;
and determines a sample of the nonsquare predictive block based on the
first IC parameter; and generates residual data based on the nonsquare
predictive block; and outputs a bitstream that includes data based on the
residual data.
Description
[0001] This application claims the benefit of U.S. Provisional Patent
Application 62/260,103, filed Nov. 25, 2015, and U.S. Provisional Patent
Application 62/310,271, filed Mar. 18, 2016, the entire content of each
of which is incorporated herein by reference.
TECHNICAL FIELD
[0002] This disclosure relates to video encoding and video decoding.
BACKGROUND
[0003] Digital video capabilities can be incorporated into a wide range of
devices, including digital televisions, digital direct broadcast systems,
wireless broadcast systems, personal digital assistants (PDAs), laptop or
desktop computers, tablet computers, ebook readers, digital cameras,
digital recording devices, digital media players, video gaming devices,
video game consoles, cellular or satellite radio telephones, socalled
"smart phones," video teleconferencing devices, video streaming devices,
and the like. Digital video devices implement video coding techniques,
such as those described in the standards defined by MPEG2, MPEG4, ITUT
H.263, ITUT H.264/MPEG4, Part 10, Advanced Video Coding (AVC), the High
Efficiency Video Coding (HEVC) standard, and extensions of such
standards. The video devices may transmit, receive, encode, decode,
and/or store digital video information more efficiently by implementing
such video coding techniques.
[0004] Video coding techniques include spatial (intrapicture) prediction
and/or temporal (interpicture) prediction to reduce or remove redundancy
inherent in video sequences. For blockbased video coding, a video slice
(e.g., a video frame or a portion of a video frame) may be partitioned
into video blocks, which may also be referred to as treeblocks, coding
units (CUs) and/or coding nodes. Pictures may be referred to as frames,
and reference pictures may be referred to as reference frames.
[0005] Spatial or temporal prediction results in a predictive block for a
block to be coded. Residual data represents pixel differences between the
original block to be coded and the predictive block. For further
compression, the residual data may be transformed from the pixel domain
to a transform domain, resulting in residual transform coefficients,
which then may be quantized. Entropy coding may be applied to achieve
even more compression.
SUMMARY
[0006] This disclosure is related to intra and inter prediction
partitions, nonsquare transforms, intra and inter coding modes for
nonsquare blocks, and associated entropy coding. Techniques of this
disclosure may be used in the context of advanced video codecs, such as
extensions of HEVC or the next generation of video coding standards. In
one example, an Illumination Compensation process is adapted for use with
nonsquare prediction units (PUs). Particularly, techniques are described
for determining parameters used for IC in the presence of nonsquare
prediction blocks.
[0007] In one example, this disclosure describes a method of decoding
video data, the method comprising: receiving, by a video decoder, a
bitstream that comprises an encoded representation of the video data;
using, by the video decoder, illumination compensation (IC) to generate a
nonsquare predictive block of a current prediction unit (PU) of a
current coding unit (CU) of a current picture of the video data, wherein
using IC to generate the nonsquare predictive block comprises:
determining, by the video decoder, based on a vector of the current PU, a
reference block in a reference picture, the reference block and the
nonsquare predictive block being the same size and shape; subsampling,
by the video decoder, a first set of reference samples to generate a
first subsampled set of reference samples with a first subsampling
ratio, a total number of reference samples in the first set of reference
samples not being equal to 2.sup.m and a total number of reference
samples in the first subsampled set of reference samples being equal to
2.sup.m, wherein the first set of reference samples comprises samples
outside the nonsquare predictive block along a left side and a top side
of the nonsquare predictive block, and m is an integer; subsampling, by
the video decoder, a second set of reference samples to generate a second
subsampled set of reference samples with a second subsampling ratio, a
total number of reference samples in the second set of reference samples
not being equal to 2.sup.m and a total number of reference samples in the
second subsampled set of reference samples being equal to 2.sup.m,
wherein the second set of reference samples comprises samples outside the
reference block along a left side and a top side of the reference block;
determining, by the video decoder, a first IC parameter based on the
total number of reference samples in the first subsampled set of
reference samples and the total number of reference samples in the second
subsampled set of reference samples, the first subsampled set of
reference samples, and the second subsampled set of reference samples;
and determining, by the video decoder, a sample of the nonsquare
predictive block based on the first IC parameter; and reconstructing, by
the video decoder, based on the nonsquare predictive block, a coding
block of the current CU.
[0008] In another example, this disclosure describes a method of encoding
video data, the method comprising: receiving, by a video encoder, the
video data; using, by the video encoder, illumination compensation (IC)
to generate a nonsquare predictive block of a current prediction unit
(PU) of a current coding unit (CU) of a current picture of the video
data, wherein using IC to generate the nonsquare predictive block
comprises: determining, by the video encoder, based on a vector of the
current PU, a reference block in a reference picture, the reference block
and the nonsquare predictive block being the same size and shape;
subsampling, by the video encoder, a first set of reference samples to
generate a first subsampled set of reference samples with a first
subsampling ratio, a total number of reference samples in the first set
of reference samples not being equal to 2.sup.m and a total number of
reference samples in the first subsampled set of reference samples being
equal to 2.sup.m, wherein the first set of reference samples comprises
samples outside the nonsquare predictive block along a left side and a
top side of the nonsquare predictive block, and m is an integer;
subsampling, by the video encoder, a second set of reference samples to
generate a second subsampled set of reference samples with a second
subsampling ratio, a total number of reference samples in the second set
of reference samples not being equal to 2.sup.m and a total number of
reference samples in the second subsampled set of reference samples
being equal to 2.sup.m, wherein the second set of reference samples
comprises samples outside the reference block along a left side and a top
side of the reference block; determining, by the video encoder, a first
IC parameter based on the total number of reference samples in the first
subsampled set of reference samples and the total number of reference
samples in the second subsampled set of reference samples, the first
subsampled set of reference samples, and the second subsampled set of
reference samples; and generating, by the video encoder, residual data
based on the nonsquare predictive block; and outputting, by the video
encoder, a bitstream that includes data based on the residual data.
[0009] In another example, this disclosure describes an apparatus for
decoding video data, the apparatus comprising: one or more storage media
configured to store the video data; and a video decoder configured to:
receive a bitstream that comprises an encoded representation of the video
data; use illumination compensation (IC) to generate a nonsquare
predictive block of a current prediction unit (PU) of a current coding
unit (CU) of a current picture of the video data, wherein the video
decoder is configured such that, as part of using IC to generate the
nonsquare predictive block, the video decoder: determines, based on a
vector of the current PU, a reference block in a reference picture, the
reference block and the nonsquare predictive block being the same size
and shape; subsamples a first set of reference samples to generate a
first subsampled set of reference samples with a first subsampling
ratio, a total number of reference samples in the first set of reference
samples not being equal to 2.sup.m and a total number of reference
samples in the first subsampled set of reference samples being equal to
2.sup.m, wherein the first set of reference samples comprises samples
outside the nonsquare predictive block along a left side and a top side
of the nonsquare predictive block, and m is an integer; subsamples a
second set of reference samples to generate a second subsampled set of
reference samples with a second subsampling ratio, a total number of
reference samples in the second set of reference samples not being equal
to 2.sup.m and a total number of reference samples in the second
subsampled set of reference samples being equal to 2.sup.m, wherein the
second set of reference samples comprises samples outside the reference
block along a left side and a top side of the reference block; determines
a first IC parameter based on the total number of reference samples in
the first subsampled set of reference samples and the total number of
reference samples in the second subsampled set of reference samples, the
first subsampled set of reference samples, and the second subsampled
set of reference samples; and determines a sample of the nonsquare
predictive block based on the first IC parameter; and reconstruct, based
on the nonsquare predictive block, a coding block of the current CU.
[0010] In another example, this disclosure describes an apparatus for
encoding video data, the apparatus comprising: one or more storage media
configured to store video data; and a video encoder configured to: use
illumination compensation (IC) to generate a nonsquare predictive block
of a current prediction unit (PU) of a current coding unit (CU) of a
current picture of the video data, wherein the video encoder is
configured such that, as part of using IC to generate the nonsquare
predictive block, the video encoder: determines, based on a vector of the
current PU, a reference block in a reference picture, the reference block
and the nonsquare predictive block being the same size and shape;
subsamples a first set of reference samples to generate a first
subsampled set of reference samples with a first subsampling ratio, a
total number of reference samples in the first set of reference samples
not being equal to 2.sup.m and a total number of reference samples in the
first subsampled set of reference samples being equal to 2.sup.m,
wherein the first set of reference samples comprises samples outside the
nonsquare predictive block along a left side and a top side of the
nonsquare predictive block, and m is an integer; subsamples a second
set of reference samples to generate a second subsampled set of
reference samples with a second subsampling ratio, a total number of
reference samples in the second set of reference samples not being equal
to 2.sup.m and a total number of reference samples in the second
subsampled set of reference samples being equal to 2.sup.m, wherein the
second set of reference samples comprises samples outside the reference
block along a left side and a top side of the reference block; determines
a first IC parameter based on the total number of reference samples in
the first subsampled set of reference samples and the total number of
reference samples in the second subsampled set of reference samples, the
first subsampled set of reference samples, and the second subsampled
set of reference samples; and determines a sample of the nonsquare
predictive block based on the first IC parameter; and generate residual
data based on the nonsquare predictive block; and output a bitstream
that includes data based on the residual data.
[0011] In another example, this disclosure describes an apparatus for
decoding video data, the apparatus comprising: means for receiving a
bitstream that comprises an encoded representation of the video data;
means for using illumination compensation (IC) to generate a nonsquare
predictive block of a current prediction unit (PU) of a current coding
unit (CU) of a current picture of the video data, wherein the means for
using IC to generate the nonsquare predictive block comprises: means for
determining, based on a vector of the current PU, a reference block in a
reference picture, the reference block and the nonsquare predictive
block being the same size and shape; means for subsampling a first set
of reference samples to generate a first subsampled set of reference
samples with a first subsampling ratio, a total number of reference
samples in the first set of reference samples not being equal to 2.sup.m
and a total number of reference samples in the first subsampled set of
reference samples being equal to 2.sup.m, wherein the first set of
reference samples comprises samples outside the nonsquare predictive
block along a left side and a top side of the nonsquare predictive
block, and m is an integer; means for subsampling a second set of
reference samples to generate a second subsampled set of reference
samples with a second subsampling ratio, a total number of reference
samples in the second set of reference samples not being equal to 2.sup.m
and a total number of reference samples in the second subsampled set of
reference samples being equal to 2.sup.m, wherein the second set of
reference samples comprises samples outside the reference block along a
left side and a top side of the reference block; means for determining a
first IC parameter based on the total number of reference samples in the
first subsampled set of reference samples and the total number of
reference samples in the second subsampled set of reference samples, the
first subsampled set of reference samples, and the second subsampled
set of reference samples; and means for determining a sample of the
nonsquare predictive block based on the first IC parameter; and means
for reconstructing, by the video decoder, based on the nonsquare
predictive block, a coding block of the current CU.
[0012] In another example, this disclosure describes an apparatus for
encoding video data, the apparatus comprising: means for receiving the
video data; means for using illumination compensation (IC) to generate a
nonsquare predictive block of a current prediction unit (PU) of a
current coding unit (CU) of a current picture of the video data, wherein
the means for using IC to generate the nonsquare predictive block
comprises: means for determining, based on a vector of the current PU, a
reference block in a reference picture, the reference block and the
nonsquare predictive block being the same size and shape; means for
subsampling a first set of reference samples to generate a first
subsampled set of reference samples with a first subsampling ratio, a
total number of reference samples in the first set of reference samples
not being equal to 2.sup.m and a total number of reference samples in the
first subsampled set of reference samples being equal to 2.sup.m,
wherein the first set of reference samples comprises samples outside the
nonsquare predictive block along a left side and a top side of the
nonsquare predictive block, and m is an integer; means for subsampling
a second set of reference samples to generate a second subsampled set of
reference samples with a second subsampling ratio, a total number of
reference samples in the second set of reference samples not being equal
to 2.sup.m and a total number of reference samples in the second
subsampled set of reference samples being equal to 2.sup.m, wherein the
second set of reference samples comprises samples outside the reference
block along a left side and a top side of the reference block; means for
determining a first IC parameter based on the total number of reference
samples in the first subsampled set of reference samples and the total
number of reference samples in the second subsampled set of reference
samples, the first subsampled set of reference samples, and the second
subsampled set of reference samples; and means for determining a sample
of the nonsquare predictive block based on the first IC parameter; and
means for generating residual data based on the nonsquare predictive
block; and means for outputting a bitstream that includes data based on
the residual data.
[0013] In another example, this disclosure describes a computerreadable
medium having instructions stored thereon that, when executed, configure
an apparatus for decoding video data to: receive a bitstream that
comprises an encoded representation of the video data; use illumination
compensation (IC) to generate a nonsquare predictive block of a current
prediction unit (PU) of a current coding unit (CU) of a current picture
of the video data, wherein the instructions configure the apparatus such
that, as part of using IC to generate the nonsquare predictive block,
the apparatus: determines, based on a vector of the current PU, a
reference block in a reference picture, the reference block and the
nonsquare predictive block being the same size and shape; subsamples a
first set of reference samples to generate a first subsampled set of
reference samples with a first subsampling ratio, a total number of
reference samples in the first set of reference samples not being equal
to 2.sup.m and a total number of reference samples in the first
subsampled set of reference samples being equal to 2.sup.m, wherein the
first set of reference samples comprises samples outside the nonsquare
predictive block along a left side and a top side of the nonsquare
predictive block, and m is an integer; subsamples a second set of
reference samples to generate a second subsampled set of reference
samples with a second subsampling ratio, a total number of reference
samples in the second set of reference samples not being equal to 2.sup.m
and a total number of reference samples in the second subsampled set of
reference samples being equal to 2.sup.m, wherein the second set of
reference samples comprises samples outside the reference block along a
left side and a top side of the reference block; determines a first IC
parameter based on the total number of reference samples in the first
subsampled set of reference samples and the total number of reference
samples in the second subsampled set of reference samples, the first
subsampled set of reference samples, and the second subsampled set of
reference samples; and determines a sample of the nonsquare predictive
block based on the first IC parameter; and reconstruct, based on the
nonsquare predictive block, a coding block of the current CU.
[0014] In another example, this disclosure describes a computerreadable
medium having instructions stored thereon that, when executed, configure
an apparatus for encoding video data to: receive the video data; use
illumination compensation (IC) to generate a nonsquare predictive block
of a current prediction unit (PU) of a current coding unit (CU) of a
current picture of the video data, wherein the instructions, when
executed, configure the apparatus such that, as part of using IC to
generate the nonsquare predictive block, the apparatus: determines,
based on a vector of the current PU, a reference block in a reference
picture, the reference block and the nonsquare predictive block being
the same size and shape; subsamples a first set of reference samples to
generate a first subsampled set of reference samples with a first
subsampling ratio, a total number of reference samples in the first set
of reference samples not being equal to 2.sup.m and a total number of
reference samples in the first subsampled set of reference samples being
equal to 2.sup.m, wherein the first set of reference samples comprises
samples outside the nonsquare predictive block along a left side and a
top side of the nonsquare predictive block, and m is an integer;
subsamples a second set of reference samples to generate a second
subsampled set of reference samples with a second subsampling ratio, a
total number of reference samples in the second set of reference samples
not being equal to 2.sup.m and a total number of reference samples in the
second subsampled set of reference samples being equal to 2.sup.m,
wherein the second set of reference samples comprises samples outside the
reference block along a left side and a top side of the reference block;
determines a first IC parameter based on the total number of reference
samples in the first subsampled set of reference samples and the total
number of reference samples in the second subsampled set of reference
samples, the first subsampled set of reference samples, and the second
subsampled set of reference samples; and determines a sample of the
nonsquare predictive block based on the first IC parameter; and
generates residual data based on the nonsquare predictive block; and
outputs a bitstream that includes data based on the residual data.
[0015] The details of one or more aspects of the disclosure are set forth
in the accompanying drawings and the description below. Other features,
objects, and advantages of the techniques described in this disclosure
will be apparent from the description, drawings, and claims.
BRIEF DESCRIPTION OF DRAWINGS
[0016] FIG. 1 is a block diagram illustrating an example video encoding
and decoding system configured to implement techniques of the disclosure.
[0017] FIG. 2A is a conceptual diagram illustrating an example transform
scheme based on a residual quadtree in High Efficiency Video Coding
(HEVC).
[0018] FIG. 2B is a conceptual diagram illustrating a residual quadtree
for the coding unit of FIG. 2A.
[0019] FIG. 3 is a conceptual diagram illustrating an example coefficient
scan based on coefficient groups in HEVC.
[0020] FIG. 4 is a conceptual diagram illustrating an example of intra
prediction for a 16.times.16 block.
[0021] FIG. 5 is a conceptual diagram illustrating an example of 35 intra
prediction modes defined in HEVC.
[0022] FIG. 6 is a conceptual diagram illustrating a planar mode defined
in HEVC.
[0023] FIG. 7 is a conceptual diagram of an example angular mode defined
in HEVC.
[0024] FIG. 8 is a conceptual diagram of partition modes for splitting a
coding unit for inter prediction in HEVC.
[0025] FIG. 9 is a conceptual diagram of short distance intra prediction
(SDIP) unit partitions.
[0026] FIG. 10 is a conceptual diagram of a 16.times.4 coefficient matrix
scanned and reorganized into an 8.times.8 matrix.
[0027] FIG. 11 is a conceptual diagram of 64 intra prediction modes.
[0028] FIG. 12A is a conceptual diagram of boundary filters for intra mode
34.
[0029] FIG. 12B is a conceptual diagram of boundary filters for intra mode
3033.
[0030] FIG. 13 is a conceptual diagram illustrating example locations of
samples used for derivation of Linear Model (LM) parameters .alpha. and
.beta..
[0031] FIG. 14 is a conceptual diagram illustrating an example of luma
positions and chroma positions for downsampling samples of a
reconstructed luma block of a current prediction unit (PU).
[0032] FIG. 15 is a conceptual diagram illustrating an example of luma
positions and chroma positions for downsampling samples of a luma block
for generating a predictive block.
[0033] FIG. 16 is a conceptual diagram illustrating an nR.times.2N
prediction mode with an N.times.N transform.
[0034] FIG. 17 is a conceptual diagram illustrating a nonsquare quadtree
(NSQT) for 2N.times.N, 2N.times.nD, and 2N.times.nU prediction modes.
[0035] FIG. 18 is a conceptual diagram illustrating a NSQT for N.times.2N,
nR.times.2N, and nL.times.2N prediction modes.
[0036] FIG. 19 illustrates neighboring pixels used for estimating
parameters in an Illumination Compensation (IC) model.
[0037] FIG. 20 is a conceptual diagram illustrating example neighboring
pixels used to estimate parameters in an IC model, in which a reference
block of a current coding unit is found by using a current prediction
unit's disparity or motion vector.
[0038] FIG. 21 is a conceptual diagram illustrating an example transform
structure for a partition size equal to 2N.times.N.
[0039] FIG. 22 is a conceptual diagram illustrating a transform structure
for a partition size equal to N.times.N/4(U), in accordance with a
technique of this disclosure.
[0040] FIG. 23 is a conceptual diagram illustrating a transform structure
for a partition size equal to N.times.N/4(U), in accordance with a
technique of this disclosure.
[0041] FIG. 24 is a block diagram illustrating an example video encoder
that may implement the techniques of this disclosure.
[0042] FIG. 25 is a block diagram illustrating an example video decoder
that is configured to implement the techniques of this disclosure.
[0043] FIG. 26 is a block diagram illustrating an example video encoder
that supports LMbased encoding in accordance with a technique of this
disclosure.
[0044] FIG. 27 is a block diagram illustrating an example video decoder
that supports LMbased decoding in accordance with a technique of this
disclosure.
[0045] FIG. 28 is a flowchart illustrating an example operation of a video
encoder in accordance with a LMbased coding technique of this
disclosure.
[0046] FIG. 29 is a flowchart illustrating an example operation of a video
decoder in accordance with a LMbased coding technique of this
disclosure.
[0047] FIG. 30 is a flowchart illustrating an example operation of a video
encoder, in accordance with a quantization technique of this disclosure.
[0048] FIG. 31 is a flowchart illustrating an example operation of a video
decoder, in accordance with a quantization technique of this disclosure.
[0049] FIG. 32 is a flowchart illustrating an example operation of a video
encoder, in accordance with a technique of this disclosure that uses IC.
[0050] FIG. 33 is a flowchart illustrating an example operation of a video
decoder, in accordance with a technique of this disclosure that uses IC.
[0051] FIG. 34 is a flowchart illustrating an example operation of a video
encoder, in accordance with a technique of this disclosure that uses a
flexible residual tree.
[0052] FIG. 35 is a flowchart illustrating an example operation of a video
decoder, in accordance with a technique of this disclosure that uses a
flexible residual tree.
DETAILED DESCRIPTION
[0053] In general, this disclosure is related to intra and inter
prediction partitions, nonsquare transforms, intra and inter coding
modes for nonsquare blocks, and associated entropy coding. Techniques of
this disclosure may be used in the context of advanced video codecs, such
as extensions of High Efficiency Video Coding (HEVC) or the next
generation of video coding standards.
[0054] In HEVC, a video coder (i.e., a video encoder or a video decoder)
partitions a coding unit (CU) of a picture into one or more prediction
units (PUs). The video coder uses intra prediction or inter prediction to
generate predictive blocks for each PU of the CU. The residual data of
the CU represents differences between the predictive blocks for the PUs
of the CU and an original coding block of the CU. In instances where the
CU is intra predicted (i.e., the predictive blocks for the PUs of the CU
are generated using intra prediction), the residual data of the CU may be
partitioned into one or more squareshaped transform units (TUs).
However, in instances where the CU is inter predicted (i.e., the
predictive blocks for the PUs of the CU are generated using inter
prediction), the residual data of the CU may be partitioned into one or
more square or nonsquare TUs. In this disclosure, references to shapes
of units (e.g., CUs, PUs, TUs) may refer to the shapes of corresponding
blocks. Thus, a nonsquare PU may be interpreted as referring to a
nonsquare prediction block, a nonsquare TU may be interpreted as
referring to a nonsquare transform block, and vice versa. Furthermore,
it is noted that a prediction block need not be tied to the concept of a
PU as PU is defined in HEVC, but rather have the meaning of a block of
samples on which a prediction (e.g., inter prediction, intra prediction)
is performed. Similarly, a transform block need not be tied to the
concept of a TU as TU is defined in HEVC, but rather have the meaning of
a block of samples on which a transform is applied.
[0055] As described below, the introduction of nonsquare TUs may
introduce certain problems when used with particular coding tools.
[0056] For example, linear modeling (LM) prediction mode is a technique
for reducing crosscomponent correlation that was studied during
development of HEVC. When a video coder uses the LM prediction mode, the
video coder predicts chroma samples of a PU based on reconstructed luma
samples of a PU of a CU. The chroma samples of a PU are chroma samples of
a chroma predictive block of the PU. Example types of chroma samples
include Cb samples and Cr samples. The video coder may generate the
reconstructed luma samples of the PU by summing samples of a luma
predictive block of the PU with corresponding luma residual samples of
the PU.
[0057] In particular, when a video coder uses the LM prediction mode, the
video coder may determine a predicted chroma sample of the PU at position
(i,j) as .alpha.rec.sub.L(i,j)+.beta., where rec.sub.L(i,j) is a
reconstructed luma sample of the PU at position (i,j) and .alpha. and
.beta. are parameters. In some cases, such as in the 4:2:0 color format,
one M.times.K chroma block corresponds to an 2M.times.2K luma block, in
this case, rec.sub.L(i,j) indicates the value located at (i,j) of a
downsampled version (with M.times.K) of the 2M.times.2K luma block. The
video coder determines the value of .alpha. and .beta. based on the
values of reconstructed luma reference samples and reconstructed chroma
reference samples. The reconstructed luma reference samples and the
reconstructed chroma reference samples are samples along the top and left
sides of the PU. The formulas for determining .beta. involve a division
operation by the total number of reference samples (denoted I, which is
equal to the summation of M and K). In typical cases, M and K are equal
and can be represented by 2.sup.l, in HEVC, l is a positive integer
value. So long as the prediction block is square, I is equal to 2.sup.m,
where m may vary for different prediction block sizes. Thus, instead of
performing a division operation to divide by I, the video coder may
perform a right shift operation when calculating the value of .beta..
Right shift operations are significantly faster and less complex to
implement than division operations. In this disclosure, references to
sizes of various types of blocks, such as CUs, TUs, and PUs, refer to the
sizes of coding blocks, transform blocks, and prediction blocks of the
CUs, TUs, and PUs, respectively. Furthermore, in this disclosure,
references to the sides of various types of video coding units, such as
CUs, TUs, and PUs, refer to sides of blocks (e.g., coding blocks,
transform blocks, prediction/predictive blocks) corresponding to the
various types of blocks.
[0058] However, if a luma block (e.g., a luma prediction block of a PU) is
not square (e.g., M is equal to 12 and K is equal to 16), I is not always
equal to 2.sup.m. Hence, if the luma block is not square, it may not be
possible to use a right shift operation in place of the division
operation when calculating the value of .beta.. Thus, the video coder may
need to implement a costly division operation to calculate the value of
.beta..
[0059] This disclosure describes a technique that may eliminate the need
to implement a division operation when calculating the value of .beta.
when using the LM prediction mode for a nonsquare blocks. In some cases,
even for a square PU wherein M is equal to K but M is not a power of 2,
the technique described here may also be applicable. In accordance with
an example of this technique, a video coder may reconstruct a set of luma
reference samples and a set of chroma reference samples. The set of luma
reference samples may comprise luma samples neighboring a top side of a
nonsquare luma block of a current picture of the video data and luma
samples neighboring a left side of the nonsquare luma block. The
nonsquare luma block may be a luma prediction block of a PU. Hence, the
PU may be a considered a nonsquare PU. The set of chroma reference
samples may comprise chroma samples neighboring the top side of a
nonsquare chroma block and chroma samples neighboring the left side of
the nonsquare chroma block. The nonsquare chroma block may be a chroma
prediction block of the PU. Additionally, the video coder may reconstruct
luma samples of the nonsquare prediction block. Furthermore, the video
coder may subsample the set of luma reference samples such that a total
number of luma reference samples in the set of luma reference samples
that neighbor a longer side of the nonsquare luma block is the same as a
total number of luma reference samples of the set of luma reference
samples that neighbor a shorter side of the nonsquare luma block. The
video coder may determine a first parameter equal to:
.alpha. = I x i y i  x i y i I x
i x i  x i x i ##EQU00001##
where I is a total number of reference samples in the set of the luma
reference samples, x.sub.i is a luma reference sample in the set of luma
reference samples, and y.sub.i is a chroma reference sample in the set of
chroma reference samples. Additionally, the video coder may determine a
second parameter equal to:
.beta.=(.SIGMA.y.sub.i.alpha..SIGMA.x.sub.i)/I
For each respective chroma sample of a predictive chroma block, the video
coder may determine a value of the respective chroma sample such that the
value of the respective chroma sample is equal to a multiplied by a
respective reconstructed luma sample corresponding to the respective
chroma sample, plus .beta.. The predictive chroma block may be a
predictive chroma block for the nonsquare PU. The video coder may
reconstruct, based in part on the predictive chroma block, a coding
block.
[0060] In the example above, by subsampling the set of luma reference
samples such that the total number of luma reference samples that
neighbor the longer side of the nonsquare luma block is the same as the
total number of luma reference samples that neighbor a shorter side of
the nonsquare luma block, the video coder may ensure that the total
number of reference samples in the set of luma reference samples is a
power of 2. Hence, the video coder may be able to use a right shift
operation instead of a division operation when calculating the value of
.beta.. Therefore, a video coder implementing the example above may be
less complex and/or faster than a video decoder forced to use a division
operation when calculating the value of .beta.. It is noted however that
a video coder may perform the actions described in the example above
using a division operation instead of a shift operation, although such a
video coder may not have the advantages of using the shift operation
instead of the division operation. In some examples, the reference
samples that neighbor a short or long side of the nonsquare prediction
block may be unavailable, in this case, there may be no need to perform
the subsampling process to available reference samples located at the
other side.
[0061] In HEVC, a video encoder applies a transform to blocks of residual
data (i.e., transform blocks) to convert the blocks of residual data into
blocks of transform coefficients. At a high level, the video encoder may
generate a block of transform coefficients (i.e., a transform coefficient
block), by first generating a block of intermediate values by applying a
Npoint 1dimension DCT transform to columns of the transform block. N is
equal to the height and width of the transform block. The video encoder
may then generate the block of transform coefficients by applying the
same Npoint 1dimensional DCT transform to the rows of the block of
intermediate values. A video decoder inverses the transform in a similar
way to recover the transform block.
[0062] As one can see from the discussion above, the process of applying
the transform in HEVC is reliant on transform blocks being square.
However, it may be desirable to have nonsquare transform blocks. For
instance, compression performance may be reduced when the boundaries of
transform blocks cross boundaries of inter or intra predictive blocks.
The use of nonsquare predictive blocks may be valuable to capture
objects that do not fall into square areas. Therefore, nonsquare
predictive blocks and/or nonsquare transforms may be useful in terms of
coding performance improvement. A transform matrix coefficient is defined
with a denominator equal to {square root over (N)} if the transform
matrix coefficient is a Npoint 1dimension DCT transform. Previous to
this disclosure, the denominator {square root over (N)} was considered
as the normalization factor and implemented by a right shift in a
quantization process. Taking a 2dimension DCT transform into
consideration, for example, a K.times.L transform, the normalization
factor would be ( {square root over (K)}* {square root over (L)}). If N
is defined by the one satisfying the equation log 2(N*N)=((log 2(K)+log
2(L))>>1)<<1), the ratio of utilized normalization factor
(sqrt(N)*sqrt(N)) and the real normalization factor ( {square root over
(K)}* {square root over (T)}) would be 1/ {square root over (2)} Directly
applying a square transform (e.g., a Npoint transform applied to both
columns and rows) to a nonsquare transform block may change the total
energy (i.e., the sum of squares of all transformed coefficients after
quantization) in the resulting transform coefficient block due to the
increased normalization factor, which results in reduced compression
performance.
[0063] As described in detail elsewhere in this disclosure, for a
transform block of size K.times.L, the video encoder multiplying the
transform coefficients by {square root over (2)} when (log 2(K)+log
2(L)) is odd, and the video decoder dividing the transform coefficients
by {square root over (2)} when (log 2(K)+log 2(L)) is odd may address
this problem.
[0064] 3DHEVC is an extension of HEVC for 3dimensional (3D) video data.
3DHEVC provides for multiple views of the same scene from different
viewpoints. The standardization efforts for 3DHEVC include the
standardization of a multiview video codec based on HEVC. In 3DHEVC,
interview prediction based on reconstructed view components (i.e.,
reconstructed pictures) from different views is enabled. Furthermore,
3DHEVC implements interview motion prediction and interview residual
prediction.
[0065] The pictures of each view that represent the same time instance of
video include similar video content. However, the video content of views
may be displaced spatially relative to one another. In particular, the
video content of the views may represent different perspectives on the
same scene. For example, a video block in a picture in a first view may
include video content that is similar to a video block in a picture in a
second view. In this example, the location of the video block in the
picture in the first view and the location of the video block in the
picture in the second view may be different. For example, there may be
some displacement between the locations of the video blocks in the
different views.
[0066] A disparity vector for a video block provides a measure of this
displacement. For example, a video block of a picture in a first view may
be associated with a disparity vector that indicates the displacement of
a corresponding video block in a picture in a second view.
[0067] Because of different camera settings or different distances from
light sources, pictures corresponding to the same time instance, but in
different views, may contain nearly the same image, but objects in one of
the pictures may be brighter than corresponding objects in the other
picture. Illumination compensation (IC) is a technique implemented in
3DHEVC for compensating for such differences in illumination between
views when performing interview prediction. In 3DHEVC, a video coder
determines a disparity vector for a current PU of a current CU of a
current picture. In addition, the video coder may calculate two IC
parameters for the current CU. This disclosure denotes the IC parameters
as a and b. Additionally, for each respective sample of a luma predictive
block of the current PU, the video coder calculates:
p(i,j)=a*r(i+dv.sub.x,j+dv.sub.y+b)
[0068] In the equation above, p(i,j) is the respective sample of the luma
predictive block of the current PU, (i,j) are coordinates indicating a
location of the respective sample relative to a topleft corner of the
current picture, dv.sub.x is a horizontal component of the disparity
vector for the current PU, dv.sub.y is a vertical component of the
disparity vector for the current PU, and a and b are the IC parameters.
[0069] As described in greater detail elsewhere in this disclosure, the
formula defining the IC parameter b in 3DHEVC involves a division
operation by the number of reference samples neighboring the current CU's
top and left sides. In 3DHEVC, the number of reference samples
neighboring the current CU's top and left sides is always a power of 2.
Consequently, the division operation in the formula defining the IC
parameter b may be implemented using a rightshift operation. As
described elsewhere in this disclosure, a rightshift operation may be
significantly less complicated to implement than a division operation and
may be significantly faster than implementing a division operation.
[0070] For 2dimension video coding, as described in H. Liu, "Local
Illumination Compensation," ITUTelecommunications Standardization
Sector, Study Group 16 Question 6, Video Coding Experts Group (VCEG),
52.sup.nd Meeting, 1926 Jun. 2015, Warsaw, Poland, document VCEGAZ06
(hereinafter, "VCEGAZ06"), Local Illumination Compensation (LIC) is
enabled or disabled adaptively for each intermode coded coding unit
(CU), and LIC is based on a linear model for illumination changes, using
a scaling factor a and an offset b. When LIC applies for a CU, for each
PU/subPU belonging to the CU, LIC parameters are derived in a way that
uses subsampled (2:1 subsampling) neighboring samples of the CU and the
corresponding pixels (identified by motion information of the current
PU/subPU) in the reference picture. For a CU with size equal to
N.times.N, the total number of boundary pixels used in parameter
calculation is N instead of 2N. An example is illustrated in FIG. 20. The
LIC parameters are derived and applied for each prediction direction
separately. A least square error method is employed to derive the
parameters a and b based on the abovementioned neighboring samples.
[0071] IC was only used when a CU only has a single PU. However, it may be
desirable to use IC in instances where a CU has multiple PUs, including
instances where the CU is partitioned into 2 or 3 PUs and/or the CU is
partitioned asymmetrically. In such instances, the number of reference
samples neighboring the current CU's top and left sides may no longer be
a power of 2. Therefore, it may not be possible to calculate the IC
parameter b using a rightshift operation. Rather, the video coder may
need to use a slower and more complicated division operation to calculate
the IC parameter b.
[0072] To address this issue, a video coder may subsample a first set of
reference samples to generate a first subsampled set of reference
samples that includes a total of 2.sup.m reference samples, where m is an
integer. In this disclosure, the terms subsampling indicates selection
of one or more samples from a set of samples and downsampling indicates
a filtering process wherein several reference samples may be used
together to derive a filtered sample. The set of reference samples may
comprise samples outside the nonsquare predictive block of a PU along a
left side and a top side of the nonsquare predictive block. Hence, the
reference samples may also be referred to herein as neighbor samples or
neighboring samples. Additionally, the video coder may subsample a
second set of reference samples to generate a second subsampled set of
reference samples that includes a total of 2.sup.m reference samples,
where m is an integer. The second set of reference samples may comprise
samples outside a reference block (e.g., an interview reference block or
temporal reference block) along a left side and a top side of the
reference block. The video coder may then determine at least the IC
parameter b based on the first subsampled set of reference samples and
the second subsampled set of reference samples. Because the first
subsampled set of reference samples and the second subsampled set of
reference samples each include 2.sup.m samples, the video coder may use
the right shift operation to calculate the IC parameter b instead of a
division operation. In this way, the techniques of this disclosure may
decrease complexity of the video coder and/or accelerate video coding.
[0073] As mentioned above, the introduction of nonsquare TUs may
introduce certain problems. For example, previous techniques of
partitioning a CU into TUs followed a quadtree splitting pattern, even
if the PUs of the CU are not square. In this disclosure, a quadtree may
also be referred to as a quartertree. Always using a quadtree splitting
pattern may result in suboptimal video data compression performance,
especially if the quadtree splitting pattern does not align the sides of
the TUs with sides of the PUs of the CU.
[0074] Hence, in accordance with a technique of this disclosure, a
transform tree of a CU is not restricted to the quadtree splitting
pattern. Rather, a node in the transform tree may have two child nodes.
Thus, in one example, a video decoder may determine a CU is partitioned
into TUs based on a tree structure. In this example, the video decoder
may determine that a node in the tree structure has exactly two child
nodes in the tree structure. In this example, a root node of the tree
structure corresponds to a coding block of the CU, each respective
nonroot node of the tree structure corresponds to a respective block
that is a partition of a block that corresponds to a parent node of the
respective nonroot node, and leaf nodes of the tree structure correspond
to the TUs of the CU. In some examples, nodes in the transform tree may
have 2 or 4 child nodes. The flexibility of a node to have 2 or 4 child
nodes may increase video coding compression performance.
[0075] FIG. 1 is a block diagram illustrating an example video encoding
and decoding system 10 that may utilize techniques of this disclosure. As
shown in FIG. 1, system 10 includes a source device 12 that provides
encoded video data to be decoded at a later time by a destination device
14. In particular, source device 12 provides the video data to
destination device 14 via a computerreadable medium 16. Source device 12
and destination device 14 may comprise any of a wide range of devices,
including desktop computers, notebook (i.e., laptop) computers, tablet
computers, settop boxes, telephone handsets such as socalled "smart"
phones, tablet computers, televisions, cameras, display devices, digital
media players, video gaming consoles, video streaming device, or the
like. In some cases, source device 12 and destination device 14 may be
equipped for wireless communication.
[0076] In the example of FIG. 1, source device 12 includes a video source
18, a storage medium 19 configured to store video data, a video encoder
20, and an output interface 22. Destination device 14 includes an input
interface 28, a storage medium 29 configured to store video data, a video
decoder 30, and a display device 32. In other examples, a source device
and a destination device may include other components or arrangements.
For example, source device 12 may receive video data from an external
video source, such as an external camera. Likewise, destination device 14
may interface with an external display device, rather than including an
integrated display device.
[0077] The illustrated system 10 of FIG. 1 is merely one example.
Techniques for processing video data may be performed by any digital
video encoding and/or decoding device. Although generally the techniques
of this disclosure are performed by a video encoding device, the
techniques may also be performed by a video encoder/decoder, typically
referred to as a "CODEC." Source device 12 and destination device 14 are
merely examples of such coding devices in which source device 12
generates coded video data for transmission to destination device 14. In
some examples, devices 12, 14 may operate in a substantially symmetrical
manner such that each of devices 12, 14 include video encoding and
decoding components. Hence, system 10 may support oneway or twoway
video transmission between video devices 12, 14, e.g., for video
streaming, video playback, video broadcasting, or video telephony.
[0078] Video source 18 of source device 12 may include a video capture
device, such as a video camera, a video archive containing previously
captured video, and/or a video feed interface to receive video data from
a video content provider. As a further alternative, video source 18 may
generate computer graphicsbased data as the source video, or a
combination of live video, archived video, and computergenerated video.
In some cases, source device 12 and destination device 14 may form
socalled camera phones or video phones. Source device 12 may comprise
one or more data storage media (e.g., storage media 19) configured to
store the video data. The techniques described in this disclosure may be
applicable to video coding in general, and may be applied to wireless
and/or wired applications. In each case, the captured, precaptured, or
computergenerated video may be encoded by video encoder 20. Output
interface 22 may then output the encoded video information onto
computerreadable medium 16.
[0079] Output interface 22 may comprise various types of components or
devices. For example, output interface 22 may comprise a wireless
transmitter, a modem, a wired networking component (e.g., an Ethernet
card), or another physical component. In examples where output interface
22 comprises a wireless receiver, output interface 22 may be configured
to receive data, such as the bitstream, modulated according to a cellular
communication standard, such as 4G, 4GLTE, LTE Advanced, 5G, and the
like. In some examples where output interface 22 comprises a wireless
receiver, output interface 22 may be configured to receive data, such as
the bitstream, modulated according to other wireless standards, such as
an IEEE 802.11 specification, an IEEE 802.15 specification (e.g.,
ZigBee.TM.), a Bluetooth.TM. standard, and the like. In some examples,
circuitry of output interface 22 may be integrated into circuitry of
video encoder 20 and/or other components of source device 12. For
example, video encoder 20 and output interface 22 may be parts of a
system on a chip (SoC). The SoC may also include other components, such
as a general purpose microprocessor, a graphics processing unit, and so
on.
[0080] Destination device 14 may receive the encoded video data to be
decoded via computerreadable medium 16. Computerreadable medium 16 may
comprise any type of medium or device capable of moving the encoded video
data from source device 12 to destination device 14. In one example,
computerreadable medium 16 may comprise a communication medium to enable
source device 12 to transmit encoded video data directly to destination
device 14 in realtime. The encoded video data may be modulated according
to a communication standard, such as a wireless communication protocol,
and transmitted to destination device 14. The communication medium may
comprise any wireless or wired communication medium, such as a radio
frequency (RF) spectrum or one or more physical transmission lines. The
communication medium may form part of a packetbased network, such as a
local area network, a widearea network, or a global network such as the
Internet. The communication medium may include routers, switches, base
stations, or any other equipment that may be useful to facilitate
communication from source device 12 to destination device 14. Destination
device 14 may comprise one or more data storage media configured to store
encoded video data and/or decoded video data.
[0081] In some examples, output interface 22 may output encoded data to a
storage device. Similarly, input interface 28 may access encoded data
from the storage device. The storage device may include any of a variety
of distributed or locally accessed data storage media such as a hard
drive, Bluray discs, DVDs, CDROMs, flash memory, volatile or
nonvolatile memory, or any other suitable digital storage media for
storing encoded video data. In a further example, the storage device may
correspond to a file server or another intermediate storage device that
may store the encoded video generated by source device 12. Destination
device 14 may access stored video data from the storage device via
streaming or download. The file server may be any type of server capable
of storing encoded video data and transmitting that encoded video data to
the destination device 14. Example file servers include a web server
(e.g., for a website), a file transfer protocol (FTP) server, network
attached storage (NAS) devices, or a local disk drive. Destination device
14 may access the encoded video data through any standard data
connection, including an Internet connection. This may include a wireless
channel (e.g., a WiFi connection), a wired connection (e.g., DSL, cable
modem, etc.), or a combination of both that is suitable for accessing
encoded video data stored on a file server. The transmission of encoded
video data from the storage device may be a streaming transmission, a
download transmission, or a combination thereof.
[0082] The techniques may be applied to video coding in support of any of
a variety of multimedia applications, such as overtheair television
broadcasts, cable television transmissions, satellite television
transmissions, Internet streaming video transmissions, such as dynamic
adaptive streaming over HTTP (DASH), digital video that is encoded onto a
data storage medium, decoding of digital video stored on a data storage
medium, or other applications. In some examples, system 10 may be
configured to support oneway or twoway video transmission to support
applications such as video streaming, video playback, video broadcasting,
and/or video telephony.
[0083] Computerreadable medium 16 may include transient media, such as a
wireless broadcast or wired network transmission, or storage media (that
is, nontransitory storage media), such as a hard disk, flash drive,
compact disc, digital video disc, Bluray disc, or other
computerreadable media. In some examples, a network server (not shown)
may receive encoded video data from source device 12 and provide the
encoded video data to destination device 14, e.g., via network
transmission. Similarly, a computing device of a medium production
facility, such as a disc stamping facility, may receive encoded video
data from source device 12 and produce a disc containing the encoded
video data. Therefore, computerreadable medium 16 may be understood to
include one or more computerreadable media of various forms.
[0084] Input interface 28 of destination device 14 receives information
from computerreadable medium 16. Input interface 28 may comprise various
types of components or devices. For example, input interface 28 may
comprise a wireless receiver, a modem, a wired networking component
(e.g., an Ethernet card), or another physical component. In examples
where input interface 28 comprises a wireless receiver, input interface
28 may be configured to receive data, such as the bitstream, modulated
according to a cellular communication standard, such as 4G, 4GLTE, LTE
Advanced, 5G, and the like. In some examples where input interface 28
comprises a wireless receiver, input interface 28 may be configured to
receive data, such as the bitstream, modulated according to other
wireless standards, such as an IEEE 802.11 specification, an IEEE 802.15
specification (e.g., ZigBee.TM.), a Bluetooth.TM. standard, and the like.
In some examples, circuitry of input interface 28 may be integrated into
circuitry of video decoder 30 and/or other components of destination
device 14. For example, video decoder 30 and input interface 28 may be
parts of a system on a chip (SoC). The SoC may also include other
components, such as a general purpose microprocessor, a graphics
processing unit, and so on.
[0085] The information of computerreadable medium 16 may include syntax
information defined by video encoder 20, which is also used by video
decoder 30, that includes syntax elements that describe characteristics
and/or processing of blocks and other coded units, e.g., groups of
pictures (GOPs). Display device 32 may display the decoded video data to
a user. For instance, destination device 14 or video decoder 30 may
output, for display by display device 32, reconstructed pictures of the
video data. Such reconstructed pictures may comprise reconstructed
blocks. Display device 32 may comprise any of a variety of display
devices such as a cathode ray tube (CRT), a liquid crystal display (LCD),
a plasma display, an organic light emitting diode (OLED) display, or
another type of display device.
[0086] Video encoder 20 and video decoder unit 30 each may be implemented
as any of a variety of suitable encoder circuitry, such as one or more
microprocessors, digital signal processors (DSPs), application specific
integrated circuits (ASICs), field programmable gate arrays (FPGAs),
discrete logic, software, hardware, firmware or any combinations thereof.
When the techniques are implemented partially in software, a device may
store instructions for the software in a suitable, nontransitory
computerreadable medium and execute the instructions in hardware using
one or more processors to perform the techniques of this disclosure. Each
of video encoder 20 and video decoder 30 may be included in one or more
encoders or decoders, either of which may be integrated as part of a
combined encoder/decoder (CODEC) in a respective device.
[0087] In some examples, video encoder 20 and video decoder 30 may operate
according to a video coding standard. Example video coding standards
include, but are not limited to, ITUT H.261, ISO/IEC MPEG1 Visual,
ITUT H.262 or ISO/IEC MPEG2 Visual, ITUT H.263, ISO/IEC MPEG4 Visual
and ITUT H.264 (also known as ISO/IEC MPEG4 AVC), including its
Scalable Video Coding (SVC) and Multiview Video Coding (MVC) extensions.
In addition, a new video coding standard, namely High Efficiency Video
Coding (HEVC), has recently been developed by the Joint Collaboration
Team on Video Coding (JCTVC) of ITUT Video Coding Experts Group (VCEG)
and ISO/IEC Motion Picture Experts Group (MPEG). Wang et al, "High
Efficiency Video Coding (HEVC) Defect Report," Joint Collaborative Team
in Video Coding (JCTVC) of ITUT SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG
11, 14.sup.th Meeting, Vienna, AT, 25 Jul.2 Aug. 2013, document
JCTVCN1003_v1 (hereinafter, "JCTVCN1003") is a draft of the HEVC
standard. JCTVCN1003 is available from
http://phenix.intevry.fr/jct/doc_end_user/documents/14_Vienna/wg11/JCTVC
N1003v1.zip.
[0088] In HEVC and other video coding specifications, a video sequence
typically includes a series of pictures. Pictures may also be referred to
as "frames." A picture may include one or more sample arrays. Each
respective sample array of a picture may comprise an array of samples for
a respective color component. In HEVC, a picture may include three sample
arrays, denoted S.sub.L, S.sub.Cb, and S.sub.Cr. S.sub.L is a
twodimensional array (i.e., a block) of luma samples. S.sub.Cb is a
twodimensional array of Cb chroma samples. S.sub.Cr is a twodimensional
array of Cr chroma samples. In other instances, a picture may be
monochrome and may only include an array of luma samples.
[0089] As part of encoding video data, video encoder 20 may encode
pictures of the video data. In other words, video encoder 20 may generate
encoded representations of the pictures of the video data. An encoded
representation of a picture may be referred to as a "coded picture" or an
"encoded picture."
[0090] To generate an encoded representation of a picture, video encoder
20 may generate a set of coding tree units (CTUs). Each of the CTUs may
comprise a CTB of luma samples, two corresponding CTBs of chroma samples,
and syntax structures used to code the samples of the CTBs. In monochrome
pictures or pictures having three separate color planes, a CTU may
comprise a single CTB and syntax structures used to code the samples of
the CTB. A CTB may be an N.times.N block of samples. A CTU may also be
referred to as a "tree block" or a "largest coding unit" (LCU). A syntax
structure may be defined as zero or more syntax elements present together
in the bitstream in a specified order. A slice may include an integer
number of CTUs ordered consecutively in a raster scan order.
[0091] This disclosure may use the term "video unit" or "video block" or
"block" to refer to one or more sample blocks and syntax structures used
to code samples of the one or more blocks of samples. Example types of
video units may include CTUs, CUs, PUs, transform units (TUs),
macroblocks, macroblock partitions, and so on. In some contexts,
discussion of PUs may be interchanged with discussion of macroblocks or
macroblock partitions.
[0092] In HEVC, to generate a coded CTU, video encoder 20 may recursively
perform quadtree partitioning on the coding tree blocks of a CTU to
divide the coding tree blocks into coding blocks, hence the name "coding
tree units." A coding block is an N.times.N block of samples. A CU may
comprise a coding block of luma samples and two corresponding coding
blocks of chroma samples of a picture that has a luma sample array, a Cb
sample array, and a Cr sample array, and syntax structures used to code
the samples of the coding blocks. In monochrome pictures or pictures
having three separate color planes, a CU may comprise a single coding
block and syntax structures used to code the samples of the coding block.
[0093] Video encoder 20 may encode CUs of a picture of the video data. As
part of encoding a CU, video encoder 20 may partition a coding block of
the CU into one or more prediction blocks. A prediction block is a
rectangular (i.e., square or nonsquare) block of samples on which the
same prediction is applied. A prediction unit (PU) may comprise a
prediction block of luma samples, two corresponding prediction blocks of
chroma samples, and syntax structures used to predict the prediction
blocks. In monochrome pictures or pictures having three separate color
planes, a PU may comprise a single prediction block and syntax structures
used to predict the prediction block. Video encoder 20 may generate
predictive blocks (e.g., luma, Cb, and Cr predictive blocks) for
prediction blocks (e.g., luma, Cb, and Cr prediction blocks) of each PU
of the CU.
[0094] Thus, in general, a PU may comprise one or more prediction blocks
of samples and syntax structures used to predict the prediction blocks.
In some example codecs, such as HEVC, a PU may be a subunit of a CU. In
other example codecs, there may be no distinction between a CU and a PU.
In some examples, other terms may be used for PU.
[0095] Video encoder 20 may use intra prediction or inter prediction to
generate a predictive block of a PU. If video encoder 20 uses intra
prediction to generate a predictive block of a PU, video encoder 20 may
generate the predictive block of the PU based on decoded samples of the
picture that includes the PU. If video encoder 20 uses inter prediction
to generate a predictive block of a PU of a current picture, video
encoder 20 may generate the predictive block of the PU based on decoded
samples of a reference picture (i.e., a picture other than the current
picture).
[0096] After video encoder 20 generates predictive blocks (e.g., luma, Cb,
and Cr predictive blocks) for one or more PUs of a CU, video encoder 20
may generate one or more residual blocks for the CU. For instance, video
encoder 20 may generate a luma residual block for the CU. Each sample in
the CU's luma residual block indicates a difference between a luma sample
in one of the CU's predictive luma blocks and a corresponding sample in
the CU's original luma coding block. In addition, video encoder 20 may
generate a Cb residual block for the CU. Each sample in the Cb residual
block of a CU may indicate a difference between a Cb sample in one of the
CU's predictive Cb blocks and a corresponding sample in the CU's original
Cb coding block. Video encoder 20 may also generate a Cr residual block
for the CU. Each sample in the CU's Cr residual block may indicate a
difference between a Cr sample in one of the CU's predictive Cr blocks
and a corresponding sample in the CU's original Cr coding block.
[0097] To reiterate, in HEVC, the largest coding unit in a slice is called
a CTU. Each picture is divided into CTUs, which may be coded in raster
scan order for a specific tile or slice. A CTU is a square block and
represents the root of a quadtree, i.e., the coding tree. A CTU contains
a quadtree, the nodes of which are CUs. In some instances, the size of a
CTU can range from 16.times.16 to 64.times.64 in the HEVC main profile
(although technically 8.times.8 CTU sizes can be supported). In some
instances, the CTU size may range from 8.times.8 to 64.times.64 luma
samples, but typically 64.times.64 is used. Each CTU can be further split
into smaller square blocks called CUs. A CU can be the same size of a
CTU, although a CU can be as small as 8.times.8. Each CU is coded with
one mode. For instance, a CU may be inter coded or intra coded. When a CU
is inter coded, the CU may be further partitioned into 2 or 4 PUs or may
become just one PU when further partition does not apply. When two PUs
are present in one CU, the two PUs can be half size rectangles or two
rectangle sizes with 1/4 or 3/4 size of the CU. When a CU is inter coded,
one set of motion information is present for each PU. In addition, each
PU is coded with a unique interprediction mode to derive the set of
motion information. In other words, each PU may have its own set of
motion information.
[0098] Furthermore, video encoder 20 may decompose the residual blocks of
a CU into one or more transform blocks. For instance, video encoder 20
may use quadtree partitioning to decompose the residual blocks (e.g.,
the luma, Cb, and Cr residual blocks) of a CU into one or more transform
blocks (e.g., luma, Cb, and Cr transform blocks). A transform block is a
rectangular (e.g., square or nonsquare) block of samples on which the
same transform is applied.
[0099] A transform unit (TU) of a CU may comprise a transform block of
luma samples, two corresponding transform blocks of chroma samples, and
syntax structures used to transform the transform block samples. Thus,
each TU of a CU may have a luma transform block, a Cb transform block,
and a Cr transform block. The luma transform block of the TU may be a
subblock of the CU's luma residual block. The Cb transform block may be
a subblock of the CU's Cb residual block. The Cr transform block may be
a subblock of the CU's Cr residual block. In monochrome pictures or
pictures having three separate color planes, a TU may comprise a single
transform block and syntax structures used to transform the samples of
the transform block.
[0100] Video encoder 20 may apply one or more transforms to a transform
block of a TU to generate a coefficient block for the TU. For instance,
video encoder 20 may apply one or more transforms to a luma transform
block of a TU to generate a luma coefficient block for the TU. A
coefficient block may be a twodimensional array of transform
coefficients. A transform coefficient may be a scalar quantity. In some
examples, the one or more transforms convert the transform block from a
pixel domain to a frequency domain.
[0101] In some examples, video encoder 20 does not apply the transform to
the transform block. In other words, video encoder 20 skips application
of the transforms to the transform block. In such examples, video encoder
20 may treat residual sample values in the same way as transform
coefficients. Thus, in examples where video encoder 20 skips application
of the transforms, the following discussion of transform coefficients and
coefficient blocks may be applicable to transform blocks of residual
samples.
[0102] After generating a coefficient block (e.g., a luma coefficient
block, a Cb coefficient block or a Cr coefficient block), video encoder
20 may quantize the coefficient block. In some examples, video encoder 20
does not quantize the coefficient block. In examples where video encoder
20 does not apply the transform to the transform block, video encoder 20
may or may not quantize residual samples of the transform block.
Quantization generally refers to a process in which transform
coefficients are quantized to possibly reduce the amount of data used to
represent the transform coefficients, providing further compression.
After video encoder 20 quantizes a coefficient block, video encoder 20
may entropy encode syntax elements indicating the quantized transform
coefficients or residual samples. For example, video encoder 20 may
perform ContextAdaptive Binary Arithmetic Coding (CABAC) on the syntax
elements indicating the quantized transform coefficients or residual
samples. In some examples, video encoder 20 uses palettebased coding to
encode CUs. Thus, an encoded block (e.g., an encoded CU) may include the
entropy encoded syntax elements indicating the quantized transform
coefficients.
[0103] Video encoder 20 may output a bitstream that includes a sequence of
bits that forms a representation of encoded pictures of the video data
and associated data (i.e., data associated with the encoded pictures).
Thus, the bitstream comprises an encoded representation of the video
data. The bitstream may comprise a sequence of network abstraction layer
(NAL) units. A NAL unit is a syntax structure containing an indication of
the type of data in the NAL unit and bytes containing that data in the
form of a raw byte sequence payload (RBSP) interspersed as necessary with
emulation prevention bits. Each of the NAL units may include a NAL unit
header and encapsulates a RBSP. The NAL unit header may include a syntax
element indicating a NAL unit type code. The NAL unit type code specified
by the NAL unit header of a NAL unit indicates the type of the NAL unit.
A RBSP may be a syntax structure containing an integer number of bytes
that are encapsulated within a NAL unit. In some instances, an RBSP
includes zero bits.
[0104] Video decoder 30 may receive a bitstream generated by video encoder
20. In addition, video decoder 30 may parse the bitstream to obtain
syntax elements from the bitstream. Video decoder 30 may reconstruct
pictures of the video data based at least in part on the syntax elements
obtained from the bitstream. The process to reconstruct pictures of the
video data may be generally reciprocal to the process performed by video
encoder 20 to encode the pictures. For instance, to reconstruct a picture
of the video data, video decoder 30 may decode blocks, such as CUs, of
the picture based on syntax elements obtained from the bitstream and/or
data from external sources.
[0105] In some examples, as part of decoding a current CU of the picture,
video decoder 30 may use inter prediction or intra prediction to generate
one or more predictive blocks for each PU of the current CU. When using
inter prediction, video decoder 30 may use motion vectors of PUs to
determine predictive blocks for the PUs of a current CU. In addition,
video decoder 30 may, in some examples, inverse quantize coefficient
blocks of TUs of the current CU. Video decoder 30 may, in some examples,
perform inverse transforms on the coefficient blocks to reconstruct
transform blocks of the TUs of the current CU. Video decoder 30 may
reconstruct the coding blocks of the current CU by adding the samples of
the predictive blocks for PUs of the current CU to corresponding decoded
samples (e.g., residual samples) of the transform blocks of the TUs of
the current CU. By reconstructing the coding blocks for each CU of a
picture, video decoder 30 may reconstruct the picture.
[0106] Moreover, in HEVC, the option to partition a picture into
rectangular regions called tiles has been specified. The main purpose of
tiles is to increase the capability for parallel processing rather than
provide error resilience. Tiles are independently decodable regions of a
picture that are encoded with some shared header information. Tiles can
additionally be used for the purpose of spatial random access to local
regions of video pictures. A typical tile configuration of a picture
consists of segmenting the picture into rectangular regions with
approximately equal numbers of CTUs in each tile. Tiles provide
parallelism at a more coarse level of granularity (picture/subpicture),
and no sophisticated synchronization of threads is necessary for their
use.
[0107] To adapt the various characteristics of the residual blocks, a
transform coding structure using the residual quadtree (RQT) is applied
in HEVC, which is briefly described in Marpe et al., "Transform Coding
Using the Residual Quadtree (RQT)," Fraunhofer Heinrich Hertz Institute,
available at
http://www.hhi.fraunhofer.de/fieldsofcompetence/imageprocessing/resear
chgroups/imagevideocoding/hevchighefficiencyvideocoding/transformco
dingusingtheresidual quadtreerqt.html. After the CTU is split
recursively into CUs, each CU is further divided into PUs and TUs. The
partitioning of a CU into TUs is carried out recursively based on a
quadtree approach, therefore the residual signal of each CU is coded by a
tree structure, namely, the residual quadtree (RQT). The RQT allows TU
sizes from 4.times.4 up to 32.times.32 luma samples. FIG. 2A and FIG. 2B
are conceptual diagrams illustrating an example transform scheme based on
a residual quadtree in HEVC. Particularly, FIG. 2A shows an example where
a CU 40 includes ten TUs, labeled with the letters a to j, and the
corresponding block partitioning. FIG. 2B is a conceptual diagram
illustrating an RQT for the CU of FIG. 2A.
[0108] A video coder may process the individual TUs in a depthfirst tree
traversal order, which is illustrated in FIG. 2A in alphabetical order,
which follows a recursive Zscan with depthfirst traversal. The quadtree
approach enables the adaptation of the transform to the varying
spacefrequency characteristics of the residual signal. Typically, larger
transform block sizes, which have larger spatial support, provide better
frequency resolution. However, smaller transform block sizes, which have
smaller spatial support, provide better spatial resolution. The tradeoff
between the two, spatial and frequency resolutions, is chosen by an
encoder mode decision, for example based on ratedistortion optimization
technique. The ratedistortion optimization technique calculates a
weighted sum of coding bits and reconstruction distortion, i.e., the
ratedistortion cost, for each coding mode (e.g., a specific RQT
splitting structure), and selects the coding mode with least
ratedistortion cost as the best mode.
[0109] Three parameters are defined in the RQT: the maximum depth of the
tree, the minimum allowed transform size and the maximum allowed
transform size. In HEVC, the minimum and maximum transform sizes can vary
within the range from 4.times.4 to 32.times.32 samples, which correspond
to the supported block transforms mentioned in the previous paragraph.
The maximum allowed depth of the RQT restricts the number of TUs. A
maximum depth equal to zero means that a CTU cannot be split any further
if each included TU reaches the maximum allowed transform size, e.g.,
32.times.32. In HEVC, larger size transforms, e.g., 64.times.64 transform
were not adopted mainly due to their limited benefit considering and
relatively high complexity for relatively smaller resolution videos.
[0110] In HEVC, regardless of the size of a TU, the residual of the TU
(e.g., a coefficient block of the TU) is coded with nonoverlapped
coefficient groups (CG). Each of the CGs contains the coefficients of a
4.times.4 block of the TU. For example, a 32.times.32 TU has a total of
64 CGs, and a 16.times.16 TU has a total of 16 CGs. The CGs of a TU are
coded according to a certain predefined scan order. When coding each CG,
the coefficients inside the current CG are scanned and coded according to
a certain predefined scan order for 4.times.4 block. FIG. 3 is a
conceptual diagram illustrating an example coefficient scan based on
coefficient groups in HEVC. Particularly, FIG. 3 illustrates the
coefficient scan for an 8.times.8 TU containing four 4.times.4 CGs.
[0111] As noted above, video encoder 20 and video decoder 30 may perform
intra prediction to generate a predictive block. Intra prediction
performs image block prediction using its spatially neighboring
reconstructed image samples. FIG. 4 is a conceptual diagram illustrating
an example of intra prediction for a 16.times.16 block. In FIG. 4, a
block square contains a 16.times.16 block 50. In FIG. 4, block 50 is
predicted by the above and left neighboring reconstructed samples 52, 54
(i.e., reference samples) along a selected prediction direction. In FIG.
4, the samples outside the black box are the reference samples. The white
arrow in FIG. 4 indicates the selected prediction direction.
[0112] FIG. 5 is a conceptual diagram illustrating an example of 35 intra
prediction modes defined in HEVC. As indicated in FIG. 5, HEVC defines 35
modes (including the Planar mode, DC mode and 33 angular modes) for the
intra prediction of a luma block. The 35 modes of the intra prediction
defined in HEVC are indexed as shown in the table below:
TABLEUS00001
TABLE 1
Specification of intra prediction mode and associated names
Intra prediction mode Associated name
0 INTRA_PLANAR
1 INTRA_DC
2 . . . 34 INTRA_ANGULAR2..INTRA_ANGULAR34
[0113] HEVC intra coding supports two types of PU division, 2N.times.2N
and N.times.N. 2N.times.2N splits a CU into one PU. In other words, the
CU has one PU with the same size as the CU. N.times.N splits a CU into
four equalsize PUs. However, the four regions specified by the
partitioning type PART N.times.N can be also represented by four smaller
CUs with the partitioning type PART 2N.times.2N. Due to this, HEVC allows
an intra CU to be split into four PUs only at the minimum CU size.
[0114] FIG. 6 is a conceptual diagram illustrating a planar mode defined
in HEVC. Planar mode is typically the most frequently used intra
prediction mode. To perform Planar prediction for an N.times.N block, for
each sample p.sub.xy located at (x,y), the prediction value is calculated
using four specific neighboring reconstructed samples, i.e., reference
samples, with a bilinear filter. The four reference samples include a
topright reconstructed sample TR, a bottomleft reconstructed sample BL,
and two reconstructed samples 60, 62 located in the same column
(r.sub.x,1) and row (r.sub.1,y) as the current sample. The planar mode
can be formulated as below:
p.sub.xy=((N1x)L+(x+1)TR+(N1y)T+(y+1)BL+N)>>(Log 2(N)+1 (1)
In formula (1) above, L corresponds to reconstructed sample 60 and T
corresponds to reconstructed sample 62. For DC mode, the prediction block
is simply filled with the average value of the neighboring reconstructed
samples. Generally, both Planar and DC modes are applied for modeling
smoothly varying and constant image regions.
[0115] FIG. 7 is a conceptual diagram of an example angular mode defined
in HEVC. The intra prediction process for angular intra prediction modes
in HEVC is described as follows. For each given angular intra prediction
mode, the intra prediction direction can be identified accordingly. For
example, the given angular intra prediction mode may be identified
according to FIG. 5. As shown in FIG. 5, intra mode 18 corresponds to a
pure horizontal prediction direction, and intra mode 26 corresponds to a
pure vertical prediction direction. Given a specific intra prediction
direction, for each respective sample of a prediction block, coordinates
(x,y) of the respective sample are first projected to a row or column of
neighboring reconstructed samples along the prediction direction. For
instance, as shown in the example of FIG. 7, coordinates (x,y) of a
sample 70 of a prediction block 72 are projected along a specific intra
prediction direction 74. Suppose (x,y) is projected to the fractional
position a between two neighboring reconstructed samples L and R. Then, a
prediction value for (x,y) is calculated using a twotap bilinear
interpolation filter, formulated as follows:
p.sub.xy=(1.alpha.)L+.alpha.R.
[0116] In HEVC, to avoid floating point operations, the above calculation
is approximated using integer arithmetic as:
p.sub.xy=((32a)L+aR+16)>>5,
where a is an integer equal to 32*.alpha..
[0117] FIG. 8 is a conceptual diagram of partition modes for splitting a
CU for inter prediction in HEVC. As shown in FIG. 8, in HEVC, an
intercoded CU can be split into one, two, or four partitions and various
types of this splitting are possible. The partitioning possibilities for
interpredicted coding blocks are depicted in FIG. 8. The upper four
partition types illustrate the cases of not splitting the CU of size
N.times.N, of splitting the CU into two partitions of size N.times.N/2 or
N/2.times.N, and of splitting the CU into four partitions of size
N/2.times.N/2, respectively. The lower four partition types in FIG. 8 are
referred to as asymmetric motion partitioning (AMP). One partition of the
AMP mode has the height or width N/4 and width or height N, respectively,
and the other partition fills the rest of the CU by having a height or
width of 3N/4 and width or height N. Each intercoded partition is
assigned one or two motion vectors and reference picture indices.
[0118] For intra slices, only intra prediction mode is allowed. Therefore,
there is no need to signal the prediction mode. However, for inter slices
(P or B slice), both intra and inter prediction mode are allowed. Thus,
in HEVC, for each CU, one flag pred_mode_flag is signaled for nonskip
mode. A partial listing of the syntax and semantics defined in HEVC for a
CU are presented below:
TABLEUS00002
7.3.8.5 Coding unit syntax
coding_unit( x0, y0, log2CbSize ) { Descriptor
if( transquant_bypass_enabled_flag )
cu_transquant_bypass_flag ae(v)
if( slice_type != I )
cu_skip_flag[ x0 ][ y0 ] ae(v)
nCbS = ( 1 << log2CbSize )
if( cu_skip_flag[ x0 ][ y0 ] )
prediction_unit( x0, y0, nCbS, nCbS )
else {
if( slice_type != I )
pred_mode_flag ae(v)
if( CuPredMode[ x0 ][ y0 ] != MODE_INTRA 
log2CbSize == MinCbLog2SizeY )
part_mode ae(v)
...
}
[0119] cu_skip_flag[x0][y0] equal to 1 specifies that for the current
coding unit, when decoding a P or B slice, no more syntax elements except
the merging candidate index merge_idx[x0][y0] are parsed after
cu_skip_flag[x0][y0]. cu_skip_flag[x0][y0] equal to 0 specifies that the
coding unit is not skipped. The array indices x0, y0 specify the location
(x0,y0) of the topleft luma sample of the considered coding block
relative to the topleft luma sample of the picture. When
cu_skip_flag[x0][y0] is not present, cu_skip_flag[x0][y0] is inferred to
be equal to 0.
[0120] pred_mode_flag equal to 0 specifies that the current coding unit is
coded in inter prediction mode. pred_mode_flag equal to 1 specifies that
the current coding unit is coded in intra prediction mode. The variable
CuPredMode[x][y] is derived as follows for x=x0 . . . x0+nCbS1 and y=y0
. . . y0+nCbS1: [0121] If pred_mode_flag is equal to 0,
CuPredMode[x][y] is set equal to MODE INTER. [0122] Otherwise
(pred_mode_flag is equal to 1), CuPredMode[x][y] is set equal to
MODE_INTRA.
[0123] When pred_mode_flag is not present, the variable CuPredMode[x][y]
is derived as follows for x=x0 . . . x0+nCbS1 and y=y0 . . . y0+nCbS1:
[0124] If slice_type is equal to I, CuPredMode[x][y] is inferred to be
equal to MODE_INTRA. [0125] Otherwise (slice_type is equal to P or B),
when cu_skip_flag[x0][y0] is equal to 1, CuPredMode[x][y] is inferred to
be equal to MODE_SKIP.
[0126] Various proposals have been made to enhance HEVC during and after
the process of developing HEVC. For example, Jianle Chen et al., "Further
improvements to HMKTA1.0", Document: VCEGAZ07_v2, 52.sup.nd Meeting:
1926 Jun. 2015, Warsaw, Poland, (hereinafter, "VCEGAZ07"), describes a
short distance intra coding scheme. Unlike traditional block partition
methods which always produce square blocks for intra prediction, the
short distance intra prediction (SDIP) scheme of VCEGAZ07 employs
nonsquare block splitting under the quadtree based block structure of
HEVC. As described in VCEGAZ07, a block is split into four nonsquare
blocks with quarter width or height, and each nonsquare block is treated
as a basic unit for prediction. The nonsquare blocks are coded and
reconstructed in order, and can provide reference pixels for intra
prediction for the next neighboring block. Therefore, the distance
between reference pixels and local pixels can be reduced, and the
precision of intra prediction can be much improved.
[0127] FIG. 9 is a conceptual diagram of SDIP unit partitions. In the SDIP
scheme, a CU that is smaller than 64.times.64 can be split into four
vertical or horizontal rectangular PUs with sizes N/2.times.2N or
2N.times.N/2 (these partition modes may be referred to in this disclosure
as hN.times.2N and 2N.times.hN, where h means half). The four PUs are
coded and reconstructed in order, from left to right in the hN.times.2N
mode, and top to bottom in the 2N.times.hN mode. In FIG. 9, dotted lines
represent PU/TU splitting, and the shaded region 78 denotes a 32.times.8
TU to be split into 4 32.times.2 TUs in the RQT structure. The upper
right 32.times.32 CU with partition mode 2N.times.hN is split into four
32.times.8 PUs, the lower left 16.times.16 CU with mode hN.times.2N is
split into four 4.times.16 PUs, the lower right 8.times.8 CU is split
into 8.times.2 PUs, and so on. The square splitting of CUs in HEVC may
also exist, such as the lower left 16.times.16 CU in the 2N.times.2N mode
and the lower right 8.times.8 CU in the N.times.N mode.
[0128] Furthermore, in an SDIP scheme, a M.times.N (M>N) TU can be
split into four TUs with size M.times.N/4, or M/4.times.N when M<N. In
other words, a split in the SDIP scheme should always be carried out
along the same direction (vertical or horizontal) in a CU. Table 2,
below, lists all the nonsquare units existing in the SDIP scheme and the
corresponding RQT depth.
TABLEUS00003
TABLE 2
List of units in SDIP scheme and the corresponding RQT depth
Unit Size
CU Size when depth = 1 (equal to PU size) Unit Size when depth = 2
32 .times. 32 32 .times. 8 32 .times. 2
8 .times. 32 2 .times. 32
16 .times. 16 16 .times. 4 
4 .times. 16 
8 .times. 8 8 .times. 2 
2 .times. 8 
[0129] In Xiaoran Cao et al "CE6.b1 Report on Short Distance Intra
Prediction Method," Doc. JCTVCE0278, Joint Collaborative Team on Video
Coding (JCTVC) of ITUT SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 5th
Meeting: Geneva, 1623 Mar. 2011 (hereinafter, "Cao 1"), partitions like
1.times.N and N.times.1 are further included and the corresponding RQT
depth and transform sizes are listed in Table 3, below:
TABLEUS00004
TABLE 3
List of transform sizes in SDIP in Cao 1
Unit Size
CU Size when depth = 1 (equal to PU size) Unit Size when depth = 2
32 .times. 32 32 .times. 8 32 .times. 2
8 .times. 32 2 .times. 32
16 .times. 16 16 .times. 4 16 .times. 1
4 .times. 16 1 .times. 16
8 .times. 8 8 .times. 2 
(2N .times. 2N) 2 .times. 8 
8 .times. 8 4 .times. 4 4 .times. 1
(N .times. N) 1 .times. 4
[0130] Furthermore, some SDIP schemes use nonsquare transform and entropy
coding. For example, an n.times.m transform is used for a nonsquare
block. For an n.times.m (n>m) block, the forward transform is
described as follows:
C.sub.n.times.m=T.sub.mB.sub.n.times.mT.sub.n.sup.T (1).
[0131] In the equation above, B.sub.n.times.m denotes a block with n rows
and m columns, T.sub.n and T.sub.m are the transform matrices of size
n.times.n and m.times.m respectively, and C.sub.n.times.m denotes the
transformed block. T.sub.n and T.sub.m are the same as the transform
matrices in HEVC. Thus, for a hardware implementation, the transform part
can be reused for nonsquare blocks. For an n.times.m (n<m) block, the
block is transposed into an m.times.n (m>n) block first and then
transformed as in equation (1). For entropy coding, to avoid duplicate
implementations, the coefficient coding of a square block is also reused.
For example, FIG. 10 is a conceptual diagram of a 16.times.4 coefficient
matrix 80 scanned and reorganized into an 8.times.8 matrix 82. In this
example, the coefficients of coefficient matrix 80 are first scanned from
high frequency to low frequency into a 1D buffer 84, as shown in FIG. 10,
and then reorganized into an 8.times.8 matrix 82 in zigzag order, which
is coded using the existing method in HEVC.
[0132] In another example of a proposed enhancement to HEVC, Liu et al.,
"Rectangular (2N.times.N, N.times.2N) Intra Prediction," Doc. JCTVCG135,
Joint Collaborative Team on Video Coding (JCTVC) of ITUT SG16 WP3 and
ISO/IEC JTC1/SC29/WG11, 7th Meeting: Geneva, 2130 Nov. 2011,
(hereinafter, "JCTVCG135"), describes extending the use of the
2N.times.N and N.times.2N partition sizes for inter coding to intra
coding. The additional PU sizes and corresponding TUs are given in Table
2, below. In JCTVCG135, the conventional transform quadtree structure is
employed.
TABLEUS00005
TABLE 2
List of transform sizes in JCTVCG135
CU Size PU size Unit Size when depth = 1 Unit Size when depth = 2
32 .times. 32 32 .times. 16 32 .times. 8 32 .times. 2
16 .times. 32 8 .times. 32 2 .times. 32
16 .times. 16 16 .times. 8 16 .times. 4 
8 .times. 16 4 .times. 16 
8 .times. 8 8 .times. 4 8 .times. 2 
4 .times. 8 2 .times. 8 
[0133] The following techniques were described in VCEGAZ07. To capture
finer edge directions presented in natural videos, VCEGAZ07 proposed
extending the directional intra modes from 33, as defined in HEVC, to 65.
FIG. 11 is a conceptual diagram illustrating proposed 67 intra prediction
modes. The directional modes described in VCEGAZ07 are indicated as
dotted arrows in FIG. 11, and the Planar and DC modes remain the same.
The denser directional intra prediction modes proposed in VCEGAZ07 apply
for all PU sizes and both luma and chroma intra predictions.
[0134] To accommodate the increased number of directional intra modes,
VCEGAZ07 proposed an improved Intra mode coding method, using 6 Most
Probable Modes (MPMs). Two major technical aspects are involved: 1) the
derivation of 6 MPMs, and 2) entropy coding of 6 MPMs. When deriving the
set of 6 MPMs, VCEGAZ06 changed the definition of the left and above
neighboring intra modes. Instead of using the intra modes from top and
left neighboring blocks directly as in HEVC, the most frequently used
intra mode along the top neighboring row and along the left neighboring
column are computed, and then used as the left and above neighboring
modes, respectively.
[0135] Furthermore, as described in VCEGAZ07, fourtap intra
interpolation filters are utilized to improve the accuracy of directional
intra prediction. For instance, as described above with respect to FIG.
7, HEVC uses a twotap linear interpolation filter to generate an intra
prediction block in the directional prediction modes (i.e., intra
prediction modes excluding Planar and DC predictors). Particularly, in
the example of FIG. 7, a video coder applies a twotap filter to samples
L and R to determine a predictive value for sample 50. In contrast to the
approach of HEVC, which applies a filter to two reference samples to
determine a predictive value for a sample of a prediction block,
VCEGAZ07 applies a filter to four reference samples to determine a
predictive value for a sample of a prediction block. In VCEGAZ07, two
types of fourtap interpolation filters are used: Cubic interpolation
filters for 4.times.4 and 8.times.8 blocks, and Gaussian interpolation
filters for 16.times.16 and larger blocks. In VCEGAZ07, the parameters
of the filters are fixed according to block size, and the same filter is
used for all predicted pixels, in all directional modes.
[0136] In HEVC, after an intra prediction block has been generated for
vertical and horizontal intra modes, a leftmost column and a topmost
row of prediction samples (i.e., samples of a predictive block) are
further adjusted, respectively. Boundary samples up to four columns or
rows are further adjusted using a twotap (for intra modes 2 and 34) or a
threetap filter (for intra modes 36 and 3033).
[0137] FIG. 12A and FIG. 12B is a conceptual diagram of boundary filters
for intra modes 3034. Particularly, FIG. 12A is a conceptual diagram of
boundary filters for intra mode 34. FIG. 12B is a conceptual diagram of
boundary filters for intra mode 3033. In FIG. 12A and FIG. 12B, the
leftmost column of blocks is a set of reference samples and the rest of
the blocks are samples of an intra predicted block. A video coder may
generate the samples of the intra predictive block in the conventional
manner. However, for intra prediction modes 3034, the video encoder may
apply one or more additional filters to the shaded pixels. Thus, examples
of the boundary prediction filters for intra mode 34 and 3033 are shown
in FIG. 12A and FIG. 12B, and the boundary prediction filters for intra
modes 2 and 36 are similar.
[0138] Particularly, in FIG. 12A, for intra mode 34, the video coder
generates each respective sample of the intra predictive block based on
reference samples above and right of the respective sample. However, this
may ignore information available from the left reference samples.
Accordingly, the video coder may apply four different filters to the four
leftmost columns. For each respective sample in lines 14 of the intra
predictive block, the video coder applies a filter based on the
respective sample and a reference sample in the opposite direction of
intra mode 34 (i.e., left and down). For line 1, the resulting sample may
be calculated as (8*a+8*b)/16, where a is the respective sample and b is
the reference sample. For line 2, the resulting sample may be calculated
as (12*a+4*b)/16, where a is the respective sample and b is the reference
sample. In the example of FIG. 12B, the directions for intra prediction
modes 3033 do not align with full integer position pixels. Rather, for
each sample of the predictive block, the directions for intra prediction
modes 3033 intersect with the reference samples at fractional positions
between two of the reference samples. Hence, when applying the boundary
filter for intra prediction modes 3033, there are two reference samples
for each sample of the leftmost column of the predictive block. In the
example of FIG. 12B, for each respective sample of the leftmost column of
the predictive block, when the intra prediction mode is 33, a video coder
may calculate the value of the respective sample as (8*a+8*b+2*c), where
a is the respective sample, b is one of the reference samples and c is
the other of the reference samples.
[0139] Video coding may be performed based on color space and color
format. For example, color video plays an essential role in multimedia
systems, where various color spaces are used to efficiently represent
color. A color space specifies color with numerical values using multiple
components. A popular color space is the RGB color space, where color is
represented as a combination of three primary color component values
(i.e., red, green and blue). For color video compression, the YCbCr color
space has been widely used, as described in A. Ford and A. Roberts,
"Colour space conversions," University of Westminster, London, Tech.
Rep., August 1998.
[0140] YCbCr can be easily converted from RGB color space via a linear
transformation and the redundancy between different components, namely
the crosscomponent redundancy, is significantly reduced in the YCbCr
color space. One advantage of YCbCr is the backward compatibility with
black and white TV as Y signal conveys the luminance information. In
addition, chrominance bandwidth can be reduced by subsampling the Cb and
Cr components in 4:2:0 chroma sampling format with significantly less
subjective impact than subsampling in RGB. Because of these advantages,
YCbCr has been the major color space in video compression. There are also
other color spaces, such as YCoCg, used in video compression. In this
disclosure, regardless of the actual color space used, the YCbCr color
space is used to represent the three color components in the video
compression scheme.
[0141] Although the crosscomplement redundancy is significantly reduced
in the YCbCr color space, correlation between the threecolor components
still exists. Various techniques have been studied to improve video
coding performance by further reducing the correlation between the three
color components.
[0142] For example. Xiaoran Cao et al., "Short distance intra coding
scheme for HEVC", 2012 Picture Coding Symposium (PCS), pp. 501504, May
79, 2012, Krakow, Poland, (hereinafter, "Cao 2") describes a short
distance intra coding scheme. Cao 2 describes a method in 4:2:0 chroma
video coding named Linear Model (LM) prediction mode, which was studied
during development of the HEVC standard. See e.g., J. Chen et al.,
"CE6.a.4: Chroma intra prediction by reconstructed luma samples", Joint
Collaborative Team on Video Coding (JCTVC) of ITUT SG16 WP3 and ISO/IEC
JTC1/SC29/WG11, JCTVCE266, 5th Meeting: Geneva, 1623 Mar. 2011, and
referred as JCTVCE266 hereafter. In 4:2:0 sampling, each of the two
chroma arrays has half the height and half the width of the luma array.
With the LM prediction mode, chroma samples are predicted based on
reconstructed luma samples of the same block by using a linear model as
follows:
pred.sub.c(i,j)=.alpha.*rec.sub.L(i,j)+.beta. (2)
where pred.sub.C(i,j) represents a prediction of chroma samples in a
current block and rec.sub.L(i,j) represents a downsampled reconstructed
luma samples of the current block. Parameters .alpha. and .beta. are
derived from causal reconstructed samples around the current block.
Causal samples of a block are samples that occur prior to the block in a
decoding order. If the chroma block size is denoted by N.times.N, then
both i and j are within the range [0, N).
[0143] Parameters .alpha. and .beta. in equation (2) are derived by
minimizing regression error between the neighboring reconstructed luma
and chroma samples around the current block.
E ( .alpha. , .beta. ) = i ( y i  ( .alpha.
x i + .beta. ) ) 2 ( 3 ) ##EQU00002##
The parameters .alpha. and .beta. are solved as follows
.alpha. = I x i y i  x i y i I
x i x i  x i x i ( 4 ) .beta. = (
y i  .alpha. x i ) / I ( 5 ) ##EQU00003##
In the equations above, x.sub.i is a downsampled reconstructed luma
reference sample where the color format is not 4:4:4 (i.e., the color
format is one in which one chroma sample corresponds to multiple luma
samples), y.sub.i is reconstructed chroma reference samples without
downsampling, and I is the number of reference samples. In other words,
the video coder may downsample the reconstructed luma reference samples
based on the color format not being 4:4:4, but refrain from downsampling
the reconstructed luma reference samples based on the color format being
4:4:4. For a target N.times.N chroma block, when both left and above
causal samples are available, the total number of involved samples I is
equal to 2N. When only left or above causal samples are available, the
total number of involved samples I is equal to N. Here, N is always equal
to 2.sup.m (wherein m may be different for different CU sizes).
Therefore, to reduce the complexity, shifting operations can be used to
implement the division operations in equations (4) and (5).
[0144] FIG. 13 is a conceptual diagram illustrating example locations of
samples used for derivation of .alpha. and .beta.. Particularly, FIG. 13
illustrates a chroma predictive block 90 of a PU and a luma predictive
block 92 of the same PU. Because chroma samples are downsampled relative
to luma samples, the width and height of chroma predictive block 90
(i.e., N) is half the width and height of luma predictive block 92 (i.e.,
2N). In the example of FIG. 13, the small squares outside the large dark
square are reference samples. In the example of FIG. 13, the small
circles indicate sample values used for determining the LM parameters
.alpha. and .beta.. As shown in the example of FIG. 13, the chroma sample
values used for determining the LM parameters .alpha. and .beta. are the
same as the reference samples for chroma predictive block 90. However,
the luma sample values used for determining the LM parameters .alpha. and
.beta. are interpolated from luma reference samples. The total number of
the resulting set of luma samples used for determining the LM parameters
.alpha. and .beta. is the same as the number of chroma samples used for
determining the LM parameters .alpha. and .beta..
[0145] In general, when the LM prediction mode is applied for a current
PU, a video coder may perform the following steps. First, the video coder
may reconstruct a luma block for the current PU. As part of
reconstructing the luma block for the current PU, the video coder may
perform intra prediction to determine a luma predictive block of the
current PU. Furthermore, as part of reconstructing the luma block for the
current PU, the video coder may add residual data to the luma predictive
block of the current PU to reconstruct the luma block for the current PU.
Second, the video coder may downsample reference luma samples that
neighbor the top and left sides of the current PU. Third, the video coder
may use equations (4) and (5) above to derive linear parameters (i.e.,
.alpha. and .beta.) based on chroma reference samples that neighbor the
top and left sides of the current PU and the downsampled luma reference
samples. This disclosure may also refer to the linear parameter as
"scaling factors." Fourth, the video coder may downsample the
reconstructed luma block for the current PU. Fifth, the video coder may
use equation (2) above to predict chroma samples (e.g., derive a
predictive chroma block) from the downsampled luma block for the current
PU and the linear parameters.
[0146] As noted above, a video coder may downsample a reconstructed luma
block of a current PU. The video coder may downsample the reconstructed
luma block of the current PU in various ways. For example, since the
typical sampling ratio of chroma components is half of that of luma
components and has 0.5 sample phase difference in vertical direction in
4:2:0 sampling, reconstructed luma samples of the current PU are
downsampled in the vertical direction and subsampled in the horizontal
direction to match the size and phase of the chroma signal. For instance,
for each value i from 0 to the width of the predictive chroma block of
the current PU minus 1 and each value j from 0 to the height of the
predictive chroma block of the current PU minus 1, a video coder may
calculate:
rec.sub.L(i,j)=(Rec.sub.LOrig[2i,2j]+Rec.sub.LOrig[2i,2j+1])>>1
(6)
[0147] In the equation above, rec.sub.L(i,j) is a luma sample
corresponding to position (i,j) relative to a topleft corner of the
downsampled reconstructed luma block of the current PU.
Rec.sub.Orig[2i,2j] and Rec.sub.Orig[2i,2j+1] are reconstructed luma
samples at positions (2i,2j) and (2i,2j+1) relative to a topleft corner
of the original reconstructed luma block of the current PU. Thus, in
equation (6), a luma sample at position (i,j) of the downsampled
reconstructed luma block of the current PU is the mean of a luma sample
at position (2i,2j) of the original reconstructed luma block of the
current PU and a luma sample at position (2i,2j+1) of the original
reconstructed luma block of the current PU.
[0148] FIG. 14 is a conceptual diagram illustrating an example of luma
positions and chroma positions for downsampling samples of a
reconstructed luma block of a current PU. FIG. 14 depicts chroma samples
as triangles and luma samples as circles. A video coder predicts the
value of a current chroma sample (represented in FIG. 14 by the filledin
triangle) from two luma samples (represented in FIG. 14 by the two
filledin circles), by applying a [1, 1] filter. The [1, 1] filter is one
example of a 2tap filter. In a [1, 1] filter the two taps are equally
weighted. For each respective triangle in FIG. 14, a video coder may
apply equation (6) to samples represented by the circles above and below
the respective triangle to determine a respective luma value for the
sample represented by respective triangle.
[0149] Furthermore, as noted above, a video coder may downsample luma
reference samples. The video coder may downsample the luma reference
samples in various ways. As shown in FIG. 14, the columns of a predictive
chroma block of a current PU are aligned with columns of the predictive
luma block of the current PU. In one example using the 4:2:0 color
format, the downsampled luma reference samples that neighbor a top side
of the current luma block may consist of each luma reference sample at an
even indexed position in the set of luma reference samples. Thus, for
each respective value of i ranging from 0 to the width of the predictive
chroma block of the current PU minus 1, the downsampling process may be
defined as:
rec.sub.L(i,1)=Rec.sub.LOrig[2i,1] (7)
In the equation above, rec.sub.L(i,1) is a downsampled luma reference
sample at position (i,1) relative to a topleft corner of the chroma
predictive block of the current PU. Rec.sub.Orig[2i,1] is a luma
reference sample at position (2i,1) relative to a topleft corner of the
original predictive luma block of the current PU.
[0150] As shown in FIG. 14, the rows of a predictive chroma block of a
current PU are not aligned with rows of a predictive luma block of the
current PU in the 4:2:0 color format. However, equations (4) and (5) for
calculating the parameters .alpha. and .beta. for LMbased prediction are
predicated on there being one luma reference sample for each chroma
reference sample. Accordingly, for a respective row of the predictive
chroma block for the current PU, the video coder may calculate an average
of the luma reference sample in a row of the predictive luma block of the
current PU above and the luma reference sample in a row of the predictive
luma block of the current PU below the row of the predictive chroma
block. For instance, for each value j ranging from 0 to the number of
rows in the predictive chroma block minus 1, the video coder may
calculate the value of a leftneighboring luma reference sample as:
rec.sub.L(1,j)=(Rec.sub.LOrig[2,2j]+Rec.sub.LOrig[2,2j+1])>>1
(8)
[0151] In the equation above, rec.sub.L(1,j) is a downsampled luma
reference sample at position (1,j) relative to a topleft corner of the
predictive chroma block of the current PU. Rec.sub.LOrig[2,2j] and
Rec.sub.LOrig[2,2j+1] are original luma samples at positions (2,2j) and
(2,2j+1) relative to a topleft corner of the original predictive luma
block of the current PU.
[0152] Other downsampling techniques have also been proposed. For
instance, in YiJen Chiu et al., "Crosschannel techniques to improve
intra chroma prediction", Joint Collaborative Team on Video Coding
(JCTVC) of ITUT SG16 WP3 and ISO/IEC JTC1/SC29/WG11, JCTVCF502, 6th
Meeting: Torino, IT, 1422 Jul. 2011 (referred to herein as
"JCTVCF502"), instead of using a twotap filter, a video coder applies
2dimensional 6tap filtering to both a current luma block and a
neighboring luma block. The 2dimensional filter coefficient set is:
[ 1 2 1 1 2 1 ] / 8 ( 9 ) ##EQU00004##
The downsampled luma samples are derived by equation (10):
rec.sub.L(i,j)=(Rec.sub.LOrig[2i,2j]*2+Rec.sub.LOrig[2i,2j+1]+Rec.sub.LO
rig[2i,2j1]+Rec.sub.LOrig[2i+1,2j]*2+Rec.sub.LOrig[2i+1,2j+1]+Rec.sub.LOr
ig[2i+1,2j1])>>3 (10)
In the equation above, rec.sub.L(i,j) is a reconstructed luma sample at
position (i,j) relative to a topleft corner of the downsampled
reconstructed luma block of the current PU and Rec.sub.LOrig[ . . . ] are
reconstructed luma samples of the original reconstructed luma block of
the current PU at positions relative to a topleft corner of the original
reconstructed luma block of the current PU.
[0153] For instance, a video coder may perform the operations of equation
(10) to determine the downsampled luma block. Equation (10) includes a
built in 6tap filter, as represented by [1, 2, 1; 1, 2, 1] with
Rec.sub.Orig[2i,2j], Rec.sub.LOrig[2i,2j+1], Rec.sub.LOrig[2i,2j1],
Rec.sub.LOrig[2i+1,2j], Rec.sub.LOrig[2i+1,2j+1], and
Rec.sub.Orig[2i+1,2j1] as 6 input samples. A tap number of a filter
indicates how many input samples are used for applying the filter. For
instance, in equation (10), the video coder uses six values from the
reconstructed luma block to generate the downsampled luma block.
[0154] FIG. 15 is a conceptual diagram illustrating an example of luma
positions and chroma positions for downsampling samples of a luma block
for generating a predictive block. As depicted in FIG. 15, a video coder
predicts a chroma sample, represented by the filledin triangle, from six
luma samples, represented by the six filledin circles, by applying a
6tap filter. Since a predictor of a chroma sample is derived using a
linear function, as defined in equation (2), it could be seen that when
the 6tap filter is applied, the predictor of one chroma sample relies on
the six neighboring luma samples. When combining equations (2) and (10),
the result is the following equation (11):
pred C ( i , j ) = .alpha. ( Rec LOrig [ 2
i , 2 j ] * 2 + Rec LOrig [ 2 i , 2 j + 1 ]
+ Rec LOrig [ 2 i , 2 j  1 ] + Rec LOrig [
2 i + 1 , 2 j ] * 2 + Rec LOrig [ 2 i + 1 , 2
j + 1 ] + Rec LOrig [ 2 i + 1 , 2 j  1 ] )
>> 3 ) + .beta. ( 11 ) ##EQU00005##
[0155] The following text refers to the downsampled reconstructed luma
sample rec.sub.L(i,j) as the corresponding downsampled luma sample for
the chroma sample located at (i,j). For example, because of 4:2:0
sampling, a 2N.times.2N luma block corresponds to an N.times.N chroma
block. With downsampling, the 2N.times.2N luma blocks becomes an
N.times.N downsampled luma block. This N.times.N downsampled luma block
is referred to as rec.sub.L(i,j) and corresponds to the N.times.N chroma
block.
[0156] Furthermore, although the above examples are described with respect
to 4:2:0 sampling, the techniques described in this disclosure are not so
limited. For instance, the techniques described in this disclosure may
also be applicable to 4:2:2 sampling. Accordingly, the examples with
respect to 4:2:0 are provided merely to assist with understanding.
[0157] Furthermore, in some examples, the techniques described in this
disclosure may be applicable to 4:4:4 sampling as well. For example, in
4:4:4 sampling, the chroma block is not subsampled relative to the luma
block. However, it may be possible to determine a predictive block for
the chroma block in such examples as well. For example, the luma block
may be filtered and the filtered block may be used as a predictive block
for the chroma block. In these examples, downsampling of the luma block
may not be needed. As explained in more detail, the example techniques
describe selection of a filter applied to samples of the luma block based
on a location of the chroma block. The techniques for selecting a filter
applied to samples of the luma block may be extended to examples where
downsampling is not needed for LM prediction, such as for 4:4:4
sampling. In such examples, the filter may not include any downsampling
so that the 4:4:4 sampling is preserved. Accordingly, the description for
4:2:0 sampling is an example, and the techniques are applicable to 4:4:4
sampling as well.
[0158] For example, rather than being limited to using only a twotap
filter or a sixtap filter to downsample the luma block, a video coder
(e.g., video encoder 20 or video decoder 30) may determine a filter from
a set of filters that is used for downsampling the luma block. As an
example, there may be a number X of different filters that the video
coder can use for downsampling. For instance, there may be a onetap
filter, a twotap filter, a threetap filter, and so forth. Moreover, for
each filter the specific taps might be different (e.g., the luma samples
used for a first twotap filter are different than the luma samples used
for a second twotap filter). In some of the examples described in this
disclosure, the set of filters includes two filters; however, more than
two filters from which the video coder determines which filter to apply
for downsampling the luma block are possible.
[0159] The video coder may use various criteria to determine which filter
to apply. As one example, the video coder determines which filter from
the set of filters to apply based on a location of the chroma block. If
the chroma block borders a left boundary of the picture, CU, PU, or TU
(e.g., the left boundary of the picture, CU, PU, or TU is the same as
chroma block edge), the video coder may use a first filter for
downsampling luma samples of the luma block that correspond to the
chroma samples of the chroma block that are on the left boundary. Samples
of the chroma block that are on the left boundary refer to the samples of
the chroma block that are closest to the left boundary including samples
that are directly on the boundary. The first filter may be applied to the
N samples closest to the boundary (e.g., sample closest to the boundary,
one next to that sample, and N such samples).
[0160] In some cases, the video coder may apply the first filter for all
luma samples of the luma block, rather than just those samples that
correspond to chroma samples that neighbor the left boundary. However,
the techniques described in this disclosure are not so limited. For all
other cases, the video coder may use a second, different filter for
downsampling the luma block.
[0161] For instance, in 4:2:0 sampling, four luma samples correspond to
one chroma sample. Accordingly, the video coder may determine which
chroma sample corresponds to which luma samples. When filters with larger
tap numbers are used, one chroma sample may correspond to more than four
luma samples. For the luma samples that correspond to a chroma sample on
a left boundary (immediately adjacent or within a number of samples), the
video coder may apply a first filter to the corresponding luma samples to
downsample the luma block, and for the luma samples that correspond to a
chroma sample that is not on a left boundary (not immediately adjacent or
not within a number of samples), the video coder may apply a second
filter to corresponding luma samples to downsample the luma block.
[0162] In some examples, the first filter may include fewer taps (e.g.,
number of samples that the filter extends over) than the second filter.
As one example, the first filter is the twotap filter and the second
filter is the sixtap filter. In this example, the video coder may
perform the operations of equation (6) to determine the downsampled luma
samples of a luma block in the case that the corresponding chroma samples
of the chroma block are on the left boundary, and may perform the
operations of equation (10) to determine the downsampled luma samples of
the luma block in the case that the corresponding chroma samples of the
chroma block are not on the left boundary. Accordingly, during the
derivation process of corresponding downsampled luma samples of chroma
samples, the video coder may apply a different filter to the luma samples
of a luma block that correspond to chroma samples of a chroma block
located at the left picture boundary, or left boundary (i.e., side) of
CU/PU/TU, compared to the filter applied to other samples of the luma
block that correspond to chroma samples that are not at the left picture
boundary or left boundary of CU, PU, or TU. Chroma samples that are at
the left boundary refer to chroma samples immediately adjacent to the
left boundary or within a certain number of samples from the left
boundary.
[0163] Using different filters allows the video coder to properly use
available sample values. For instance, using a sixtap filter for luma
samples that correspond to chroma samples at the left boundary of
picture, CU, PU, or TU may result in requiring the video coder to use
luma sample values that are not part of the luma block for downsampling
and may result in the video coder having to perform some additional
processing to address the lack of luma samples (e.g., padding luma sample
values to generate values for samples that are not part of a luma block).
However, using a twotap filter at the left boundary may not require the
video coder to use luma sample values that are not part of the luma block
for downsampling. Accordingly, although twotap and sixtap filters are
described, other sized filters for downsampling may be possible with
consideration to avoid needing to require luma samples that are not part
of the luma block (e.g., to avoid the need to pad luma samples on the
left boundary).
[0164] As one example, during the derivation process of corresponding
downsampled luma samples of chroma samples, the video coder applies a
different filter to luma samples that correspond to chroma samples
located at the left picture boundary compared to the filter applied to
other luma samples that correspond to chroma samples not located at the
left picture boundary. In one example, the length (e.g., tap) of the
filter (i.e., the number of samples that the filter extends over) for
deriving the corresponding downsampled luma samples of chroma samples at
the left picture boundary is smaller than the length of the filter for
deriving the corresponding downsampled luma samples of chroma samples
not at the left picture boundary (e.g., twotap for the left boundary and
sixtap for all others).
[0165] As one example, during the derivation process of corresponding
downsampled luma samples of chroma samples, the video coder applies a
different filter for luma samples of chroma samples located at the left
CU boundary compared to the filter applied to other luma samples within
current CU. In one example, the length (e.g., taps) of the filter (i.e.,
number of samples that the filter extends over) for deriving the
corresponding downsampled luma samples of chroma samples at the left CU
boundary is smaller than the length of the filter for deriving the
corresponding downsampled luma samples of chroma samples not at the left
CU boundary (e.g., twotap for the left boundary and sixtap for all
others).
[0166] As one example, during the derivation process of corresponding
downsampled luma samples of chroma samples, the video coder applies a
different filter for chroma samples located at the left PU boundary
compared to the filter applied to other samples within current PU. In one
example, the length (e.g., taps) of the filter (i.e., the number of
samples that the filter extends over) for deriving the corresponding
downsampled luma samples of chroma samples at the left PU boundary is
smaller than the length of the filter for deriving the corresponding
downsampled luma samples of chroma samples not at the left PU boundary
(e.g., twotap for the left boundary and sixtap for all others).
[0167] As one example, during the derivation process of corresponding
downsampled luma samples of chroma samples, the video coder may apply a
different filter for chroma samples located at the left TU boundary
compared to the filter applied to other samples within current TU. In one
example, the length (e.g., taps) of the filter (i.e., the number of
samples that the filter extends over) for deriving the corresponding
downsampled luma samples of chroma samples at the left TU boundary is
smaller than the length of the filter for deriving the corresponding
downsampled luma samples of chroma samples not at the left TU boundary
(e.g., twotap for the left boundary and sixtap for all others).
[0168] In some cases, there may not be corresponding luma samples in the
same picture. The following describes some example techniques to address
such situations. For instance, although avoiding padding may be
beneficial in some cases, in some instances, it may not be possible to
avoid padding. For example, because some luma samples are unavailable
(e.g., because off picture), the video coder may substitute padding
sample values for these unavailable samples and perform downsampling
with these padding sample values (e.g., downsample using the actual luma
sample values for the available luma samples and padding sample values
for the unavailable luma samples). The padding sample values may be
default values (e.g., 2.sup.bitdepth wherein bitdepth indicates the bit
depth of luma component), values determined by video encoder 20 and
signaled to video decoder 30, or values determined based on some implicit
technique that does not require signaling of information. Adding padding
sample values may reduce complexity because there may not be a need for
separate filters.
[0169] During the derivation process of corresponding downsampled luma
samples of chroma samples, when the luma samples are outside of a
picture, or a CU/PU/TU needs to be involved in the downsampling process,
the video coder may first apply a padding operation, followed by a
downsampling process. In the padding of samples, the video coder may
substitute those samples that are off screen with padding sample values.
[0170] As one example, during the derivation process of corresponding
downsampled luma samples of chroma samples, the video coder may pad the
luma samples (e.g., only the luma samples) which are located outside of
the current picture. For all other positions, the reconstructed samples
are used. As one example, during the derivation process of corresponding
downsampled luma samples of chroma samples, the video coder may pad the
luma samples which are located outside of the current CU. For all other
positions, the reconstructed samples are used. As one example, during the
derivation process of corresponding downsampled luma samples of chroma
samples, the video coder may pad the luma samples which are located
outside of the current PU. For all other positions, the reconstructed
samples are used. As one example, during the derivation process of
corresponding downsampled luma samples of chroma samples, the video
coder may pad the luma samples which are located outside of the current
TU. For all other positions, the reconstructed samples are used. In above
examples for padding, the same downsampling process is applied to all
positions.
[0171] When the position of luma reconstructed samples used in LM
prediction mode is located outside the current slice or current tile, the
video coder may mark such samples as unavailable (e.g., the video coder
may determine such samples as unavailable). When the sample is marked as
unavailable, the video coder may perform one or more of the following.
[0172] The unavailable samples, if used in a downsampling process for a
neighboring luma block, are not used in the downsampling process for a
neighboring luma block. Alternatively or additionally, the filter may be
different from the filter used for other samples. The unavailable
samples, if used in a downsampling process for a current luma block, are
not used in the downsampling process for a current luma block.
Alternatively or additionally, the filter may be different from the
filter used for other samples. The unavailable samples are remarked as
available; however, the sample value is modified to be the padded sample
value or a default value. Alternatively or additionally, the filter is
kept the same as the filter used for other samples. In one example, the
default value is dependent on the bitdepth. In another example, the
padding could be from the left/right/above/below sample which is marked
as available.
[0173] In general, for luma samples that are in another tile, the video
coder may mark pixels outside the tile boundary as unavailable and not
include them in the downsampling process. In some examples, the video
coder may mark the luma samples in another tile as available but use
padded pixels for such luma samples in another tile. As another example,
the video coder may use padded "extended" values (e.g., one half possible
value based on bit depth, so 8 bit, use 128) for luma samples in another
tile, rather than marking the samples as unavailable.
[0174] In some examples, the video coder may apply different filters to
different chroma color components (Cb or Cr). In some examples, when LM
prediction mode is enabled, one or more sets of the downsampling filter
may be further signaled in either a sequence parameter set (SPS), picture
parameter set (PPS), or slice header. Alternatively or additionally, a
Supplemental Enhancement Information (SEI) message syntax is introduced
to describe the downsampling filter. Alternatively or additionally,
furthermore, a default downsampling filter is defined, e.g., the 6tap
filter [1, 2, 1; 1, 2, 1] without signaling. Alternative or additionally,
one PU/CU/largest CU may signal an index of the filter that is used in LM
prediction mode. Alternatively or additionally, the usage of the filter
tap may be derived onthefly by video decoder 30 without signaling.
There may be other ways to provide filter support as well.
[0175] In one example, furthermore, a constraint is applied that
.alpha..sub.i is equal to .alpha..sub.(i+3). In one example, furthermore,
a constraint is applied that .alpha..sub.i is equal to .alpha..sub.(i+2)
with i being equal to 0 or 3. In one example, this example technique may
only be enabled for larger coded CUs, e.g., CU size larger than
16.times.16. In one example, one or more of the parameters is restricted
to be 0.
[0176] Moreover, the video coder may apply one or more of the above
techniques also for cross component residual prediction, in which the
downsampled luma residual is used to predict the chroma residual. In
this case, the downsampling process is applied to reconstructed luma
residual, as one example.
[0177] The following is an example manner in which techniques described in
this disclosure may be implemented by a video coder. The example
implementation technique should not be considered limiting.
[0178] Below is an example for applying different downsampling processes
for samples at the left picture boundary. The downsampling process for a
current luma block is defined as follows: [0179] if the chroma sample
is not located at the left boundary of picture, 6tap filter, e.g. [1 2
1; 1 2 1] is applied to derive the corresponding downsampled luma
sample:
[0179] rec.sub.L(i,j)=(Rec.sub.LOrig[2i,2j]*2+Rec.sub.LOrig[2i,2j+1]+Rec
.sub.LOrig[2i,2j1]+Rec.sub.LOrig[2i+1,2j]*2+Rec.sub.LOrig[2i+1,2j+1]+Rec.
sub.LOrig[2i+1,2j1]+offset0)>>3 (13) [0180] Otherwise, if the
chroma sample is located at the left boundary of the picture, 2tap
filter, e.g., [1; 1] is applied to derive the corresponding downsampled
luma sample:
[0180] rec.sub.L(i,j)=(Rec.sub.LOrig[2i,2j]+Rec.sub.LOrig[2i,2j+1]+offse
t1)>>1 (14)
[0181] In one example, offset0 and offset1 are both set equal to 0. In
another example, offset0 is set equal to 4 and offset1 is set equal to 1.
[0182] In HEVC, a square transform is always applied, even for rectangular
PUs. For example, FIG. 16 is a conceptual diagram illustrating
nR.times.2N prediction mode with N.times.N transform. In the example of
FIG. 16, the nR.times.2N prediction mode partitions a coding block 100
with a 2N.times.2N block size into two prediction blocks with sizes of
0.5N.times.2N and 1.5N.times.2N, respectively. However, in the example of
FIG. 16, the transform block size is N.times.N.
[0183] FIG. 17 is a conceptual diagram illustrating a nonsquare quadtree
(NSQT) for 2N.times.N, 2N.times.nD and 2N.times.nU prediction modes. In
FIG. 17, a 2N.times.2N block at level 0 is split into four 2N.times.0.5N
blocks located at level 1; the block at level 1 is further split into
four N.times.0.25N blocks locate at level 2. FIG. 18 is a conceptual
diagram illustrating a NSQT for N.times.2N, nR.times.2N and nL.times.2N
prediction modes. In FIG. 18, a 2N.times.2N block at level 0 is split
into four 0.5N.times.2N blocks located at level 1; the block at level 1
is further split into four 0.25N.times.N blocks locate at level 2.
[0184] Considering that residuals might be discontinuous at the boundaries
of two connective prediction blocks, high frequency transform
coefficients will likely be produced and the coding performance will be
affected. In this disclosure, connective predictive blocks are predictive
blocks that share at least one of the four boundaries. Therefore, in Yuan
et al., "NonSquare Quadtree Transform Structure for HEVC," 2012 Picture
Coding Symposium (PCS), pp. 505508, May 79, 2012, Krakow, Poland
(hereinafter, "Yuan"), a nonsquare quadtree transform (NSQT) structure
is described.
[0185] In NSQT, two additional transform block sizes are added:
2N.times.0.5N and 0.5N.times.2N. In this structure, a transform block is
split into 2N.times.0.5N and 0.5N.times.2N and transform matrix can be
obtained by reusing 0.5N.times.0.5N and 2N.times.2N transform matrixes.
In this disclosure, a transform matrix may also be referred to as a
transform core. In Yuan, the N.times.N quantization table of HEVC is
reused to quantize the transform coefficients of 2N.times.0.5N and
0.5N.times.2N transform blocks.
[0186] As mentioned above, a video coder may apply a transform to convert
samples to a frequency domain, or vice versa. The specific types of
transforms applied in HEVC are two types of discrete cosine transforms,
namely DCTII and 4.times.4 DSTVII. Xin Zhao et al., U.S. Patent
Publication 2016/0219290 A1 proposed an Enhanced Multiple Transform (EMT)
scheme in addition to DCTII and 4.times.4 DSTVII for both inter and
intra coded blocks. The EMT scheme utilizes multiple selected transforms
from the DCT/discrete sine transform (DST) families other than the
current transforms in HEVC. The newly introduced transform matrices in
U.S. Patent Publication 2016/0219290 are DSTVII, DCTVIII, DSTI and
DCTV.
[0187] The proposed EMT in U.S. Patent Publication 2016/0219290 A1 applies
to CUs smaller than 64.times.64, and whether EMT applies or not is
controlled at the CU level using a flag, namely an EMT flag, for all TUs
within a CU. For each TU within an EMTenabled CU, the horizontal or
vertical transform to be used is signaled by an index to a selected
transform set, namely an EMT index. Each transform set is formed by
selecting two transforms from the aforementioned transform matrices.
[0188] For intra prediction residual, the transform sets are predefined
based on the intra prediction mode, as described in X. Zhao et al.,
"Video coding with ratedistortion optimized transform," IEEE Trans.
Circuits Syst. Video Technol., vol. 22, no. 1, pp. 138151, January 2012;
thus each intra prediction mode has its own transform set. For example,
one transform set can be {DCTVIII, DSTVII}. Note that the transform set
for the horizontal transform may be different from the transform set for
the vertical transform, even for a same intra prediction mode. However,
the total number of different transform sets for all intra prediction
modes as well as the number of newly introduced transforms is limited.
However, for inter prediction residual, only one transform set is used
for all inter modes and for both horizontal and vertical transforms.
[0189] Illumination compensation (IC) in the multiview video coding is
used for compensating illumination discrepancies between different views
because each camera may have different exposure to a light source.
Typically, a weight factor and/or an offset are used to compensate the
differences between a coded block and a prediction block in a different
view. Illumination compensation was introduced to improve the coding
efficiency for blocks predicted from interview reference pictures.
Therefore, illumination compensation may only apply to blocks predicted
by an interview reference picture.
[0190] Liu et al., "3DCE1.h related: Illumination Compensation for
InterView Prediction," Joint Collaborative Team on 3D Video Coding
Extension Development of ITUT SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11,
1.sup.st Meeting, Stockholm, SE, 1620 Jul. 2012, document JCT3VA0086
(hereinafter, JCT3VA0086), describes illumination compensation (IC). In
JCT3VA0086, IC is enabled for interview prediction. Furthermore, as
described in JCT3VA0086, a IC process derives IC parameters based on
neighboring samples of a current CU and neighboring samples of a
reference block. In JCT3VA0086, IC only applies to a 2N.times.2N
partition mode. Furthermore, in JCT3VA0086, for AMVP mode, one IC flag
is signaled for each CU that is predicted from an interview reference
picture. For merge mode, to save bits, an IC flag is signaled only when a
merge index of the PU is not equal to 0. The IC flag indicates whether IC
is used for a CU. IC does not apply to CUs that are only predicted from
temporal reference pictures.
[0191] As described in JCT3VA0086, a linear IC model used in interview
prediction is shown in Eq. (6):
p(i,j)=a*r(i+dv.sub.x,j+dv.sub.y+b), where (i,j).epsilon.PU.sub.c (15)
Here, PU.sub.c is a current PU, (i,j) are the coordinates of pixels in
PU.sub.c, (dv.sub.x,dv.sub.y) is a disparity vector of PU.sub.c, p(i,j)
is the prediction of PU.sub.c, and r is the current PU's reference
picture from a neighboring view. a and b are parameters of the linear IC
model.
[0192] In JCT3VA0086, two sets of pixels as shown in FIG. 19 are used to
estimate parameters a and b for a current PU. The first set of pixels
includes available reconstructed neighboring pixels in a left column and
an above row of a current CU (i.e., a CU that contains the current PU).
The second set of pixels includes corresponding neighboring pixels of a
reference block of the current CU. The disparity vector of the current PU
is used to find the reference block of the current CU.
[0193] FIG. 19 illustrates neighboring pixels used to estimate parameters
in the IC model. Particularly, FIG. 19 includes a current CU 110 and
reference block 112. Each respective square of FIG. 19 corresponds to a
respective sample. Thus, current CU 110 and reference block 112 each
include 64 samples. The squares enclosing circles adjacent to current CU
110 correspond to the neighboring samples of current CU 110 (i.e.,
Rec.sub.neigh). The squares enclosing circles adjacent to reference block
CU 112 correspond to the neighboring samples of neighboring block 112
(i.e., Rec.sub.refneigh). As described elsewhere in this disclosure, a
video coder may use Rec.sub.neigh and Ref.sub.refneigh to estimate
parameters for IC.
[0194] Furthermore, as described in JCT3VA0086, let Rec.sub.neig denote a
neighboring pixel set used by the current CU. Let Rec.sub.refneigh denote
a neighboring pixel set used by the reference block of the current CU.
Let the size of the current CU and the size of the reference block of the
current CU both be equal to N.times.N. Let 2N denote the number of pixels
in Rec.sub.neig and Rec.sub.refneig. Then, a and b can be calculated as:
a = 2 N i = 0 2 N  1 Rec neig ( i )
Rec refneig ( i )  i = 0 2 N  1 Rec
neig ( i ) i = 0 2 N  1 Rec refneig ( i )
2 N i = 0 2 N  1 Rec refneig ( i )
Rec refneig ( i )  ( i = 0 2 N  1 Rec
refneig ( i ) ) 2 ( 16 ) b = i = 0 2
N  1 Rec neig ( i )  a i = 0 2 N  1
Rec refneig ( i ) 2 N ( 17 ) ##EQU00006##
In some cases, only a is used in linear model and b is always set equal
to 0, or only b is used and a is always set equal to 1.
[0195] In VCEGAZ06, Local Illumination Compensation (LIC) is enabled or
disabled adaptively for each intermode coded CU. In VCEGAZ06, LIC is
based on a linear model for illumination changes, using a scaling factor
a and an offset b. FIG. 20 is a conceptual diagram illustrating example
neighboring samples used for deriving IC parameters as described in
VCEGAZ06.
[0196] In VCEGAZ06, when LIC applies for a CU, for each PU/subPU
belonging to the CU, a video coder derives LIC parameters in a way that
using subsampled (2:1 subsampling) neighboring samples of the CU and
the corresponding pixels (identified by motion information of the current
PU/subPU) in the reference picture. For a CU with size equal to
N.times.N, the total number of boundary pixels used in equations (16) and
(17) is N instead of 2N. An example is illustrated in FIG. 20. Thus, FIG.
20 is a conceptual diagram illustrating example neighboring pixels used
to estimate parameters in an illumination compensation model, in which a
reference block 114 of a current CU 116 is found by using a disparity
vector of a current PU. In VCEGAZ06, the IC parameters are derived and
applied for each prediction direction separately. A video coder may
employ a least square error method to derive the parameters a and b based
on the abovementioned neighboring samples.
[0197] The current RQT design in HEVC, and other techniques such as NSQT
and IC, may have the following shortcomings. For instance, regardless of
whether the NSQT or the transform tree of HEVC is used, a quadtree
structure is always employed which may be suboptimal without considering
PU information. However, HEVC only supports square PUs for intra
prediction modes.
[0198] Introducing 2N.times.N and N.times.2N partitions to intra modes, as
is done in JCTVCG135, may have the following problems. First, AMP is not
allowed. Second, how to define the transform tree structure to achieve
high coding efficiency has not been studied. Third, the LM prediction
mode has only been used with square PUs and it is unknown how to derive
the parameters .alpha. and .beta. used in the LM prediction mode with
nonsquare PUs. Fourth, in prior techniques, the coefficients must be
reorganized to be in a square form, which may reduce the correlation
among neighboring coefficients. Furthermore, the current EMT design has a
problem in that EMT is controlled at the CU level. However, controlling
EMT at the CU level is not efficient if the residual characteristics
(e.g., distributions) of each PU in a CU are different.
[0199] To resolve the problems mentioned above, this disclosure proposes
the following techniques. The following itemized techniques may be
applied individually. Alternatively, any combination of them may be
applied. In the following description, the CU size is denoted by
M.times.M and PU size is denoted by K.times.L, wherein both K and L are
no larger than M.
[0200] In accordance with a first example technique of this disclosure, it
is proposed that a transform tree is not restricted to be a quarter tree.
For example, a transform quadtree and a transform binary tree may be
combined. That is, for at least a certain transform depth, one TU may be
split into two smaller TUs or four smaller TUs. In this disclosure, for
each respective node of a transform tree, the respective transform depth
of the respective node refers to the number of nodes in the transform
tree between the respective node and the root node of the transform tree.
The flexibility to split a TU into two TUs or four TUs may enhance the
ability of video encoder 20 to structure the transform tree in a way that
aligns TU boundaries with PU boundaries. Aligning TU boundaries with PU
boundaries may increase compression performance.
[0201] Thus, in this example, video encoder 20 may partition a CU of video
data into TUs of the CU based on a tree structure. In this example, a
root node of the tree structure corresponds to a coding block of the CU.
Furthermore, in this example, each respective nonroot node of the tree
structure corresponds to a respective block that is a partition of a
block that corresponds to a parent node of the respective nonroot node.
In this example, leaf nodes of the tree structure correspond to the TUs
of the CU. In this example, at least one node in the tree structure has
exactly two child nodes in the tree structure. In some instances, at
least one node in the tree structure may have exactly four child nodes in
the tree structure. In this example, video encoder 20 may include, in a
bitstream that comprises an encoded representation of the video data,
data representing one or more of the TUs of the CU.
[0202] In a corresponding example, video decoder 30 may determine a CU is
partitioned into TUs of the CU based on a tree structure. In this
example, a root node of the tree structure corresponds to a coding block
of the CU. Furthermore, in this example, each respective nonroot node of
the tree structure corresponds to a respective block that is a partition
of a block that corresponds to a parent node of the respective nonroot
node. In this example, leaf nodes of the tree structure correspond to the
TUs of the CU. In this example, at least one node in the tree structure
has exactly two child nodes in the tree structure, and at least one node
in the tree structure has exactly four child nodes in the tree structure.
In this example, video decoder 30 may reconstruct, based on data for at
least one of the TUs of the CU, the coding block of the CU.
[0203] Furthermore, in an example where a transform tree is not restricted
to be a quarter tree (i.e., not required to be a tree in which all
nonleaf nodes have 4 child nodes), for transform depth equal to 0, the
square transform with size equal to M.times.M is applied. For transform
depth equal to 1, the transform is split into two or four (depending on
the number of PUs) and transform size is equal to K.times.L. For
remaining transform depths, the quadtree structure is still applied
wherein one TU is split into four smaller ones, i.e., for transform depth
equal to 2, the transform size is set to K/2.times.L/2. An example is
given in FIG. 21. One reason for limiting splitting of transform into two
or four at transform depth 1 is to align transform sizes with PU sizes,
e.g., if a PU size is 2N.times.N or N.times.2N, splitting into 2 may be
preferred. If a PU is an N.times.N partition, 4way transform splitting
may yield better results. In another reason, if the corresponding
transform matrix was unknown, e.g., if AMP is used, one 16.times.16 CU
may be split into 4.times.16 and 12.times.16 PUs, while 12.times.12
transform is not defined, therefore, splitting to 4 may be used for this
case.
[0204] FIG. 21 is a conceptual diagram illustrating an example transform
structure for partition size equal to 2N.times.N. In FIG. 21 and the
following figures, the dash lines indicate the splitting information for
the next transform depth. Particularly, in FIG. 21, a transform block 130
has the same size as a coding block of a CU. Transform block 130 is
partitioned into transform blocks 132 and 134. Furthermore, in the
example of FIG. 21, transform block 132 is partitioned into transform
blocks 136, 137, 138, and 139. Transform block 134 is partitioned into
transform blocks 140, 141, 142 and 143. Thus, as shown in FIG. 21, a root
node may have 2 child nodes, but nodes at other transform depths may be
required to have 0 or 4 child nodes.
[0205] In some examples where a transform tree of a CU is not restricted
to being a quarter tree, either a binary tree or a quarter tree is
applied. A video coder may determine whether a binary tree or a quarter
tree is applied based on the number of PUs in the CU. For example, when
there are two PUs, the video coder utilizes a binary transform tree. If
the CU has four PUs, the video coder may use a quarter tree structure to
partition the CU into TUs. In one example, the method of selecting either
binary tree or quarter tree is only applied to certain transform depths,
such as 1.
[0206] Thus, in this example, video encoder 20 may partition a CU of the
video data into TUs of the CU based on a tree structure. In this example,
a root node of the tree structure corresponds to a coding block of the
CU, each respective nonroot node of the tree structure corresponds to a
respective block that is a partition of a block that corresponds to a
parent node of the respective nonroot node, leaf nodes of the tree
structure correspond to the TUs of the CUs, and the CU has one or more
PUs. Furthermore, in this example, depending on the number of PUs of the
CU, exactly one of the following applies: each node in the tree structure
has exactly two child nodes in the tree structure, or each node in the
tree structure has exactly four child nodes in the tree structure. In
this example, video encoder 20 may include, in a bitstream that comprises
an encoded representation of the video data, data representing one or
more of the TUs of the CU.
[0207] In a corresponding example, video decoder 30 may determine a CU of
the video data is partitioned into TUs of the CU based on a tree
structure. In this example, a root node of the tree structure corresponds
to a coding block of the CU, each respective nonroot node of the tree
structure corresponds to a respective block that is a partition of a
block that corresponds to a parent node of the respective nonroot node,
leaf nodes of the tree structure correspond to the TUs of the CUs, and
the CU has one or more PUs. In this example, depending on the number of
PUs of the CU, exactly one of the following applies: each node in the
tree structure has exactly two child nodes in the tree structure, or each
node in the tree structure has exactly four child nodes in the tree
structure. Furthermore, in this example, video decoder 30 may
reconstruct, based on data for at least one of the TUs of the CU, the
coding block of the CU.
[0208] In some examples where the transform tree is not restricted to be a
quarter tree, the splitting method of either binary or quarter tree is
signaled. For example, video encoder 20 may include, in the bitstream
data representing a syntax element that indicates whether a CU is
partitioned into TUs according to a binary tree or according to quarter
tree. In this example, video decoder 30 may determine, based on data in
the bitstream, a value of the syntax element. Furthermore, in this
example, video decoder 30 may determine, based on the value of the syntax
element, whether the CU is partitioned into TUs according to a binary
tree or according to a quarter tree.
[0209] Alternatively, furthermore, the signaling may be skipped for
certain PU partitions. In other words, video encoder 20 may skip
signaling of transform tree splitting for a block based on how the block
is split into PUs. For instance, in one example, for PU partitions equal
to 2N.times.N, a binary tree is always used and therefore, there is no
need to signal that binary or quarter tree splitting is used in the
corresponding transform tree.
[0210] In accordance with a second technique of this disclosure, it is
proposed that at a certain transform depth, the transform size is equal
to the PU size for the rectangular PUs. In this disclosure, transform
size refers to a size of a transform block of a TU. Thus, in this
example, the transform blocks corresponding to nodes at a particular
depth of a transform tree of a CU have the same sizes as prediction
blocks of PUs of the CU. As previously discussed, aligning TU boundaries
with PU boundaries may improve compression performance.
[0211] In one example of the second technique, the above method is only
applied to inter coded CUs. In other words, a video coding standard may
require video encoder 20 to ensure that transform sizes are equal to PUs
for inter coded CUs, but this requirement does not apply to intra coded
CUs.
[0212] Furthermore, in some examples, video encoder 20 may signal one flag
for each PU to indicate whether there exists at least one nonzero
coefficient for the three color components (e.g., Y, Cb, and Cr). As
mentioned above, the second technique of this disclosure requires the
sizes of TUs of a CU to be equal to the sizes of PUs of the CU at a
particular depth in the transform tree. Hence, at the particular depth,
the transform tree includes a respective transform tree node for each
respective PU of the CU. For each respective transform tree node of the
transform tree at the particular depth, the respective transform tree
node corresponds to a luma transform block and chroma transform blocks
having the same sizes and shapes as a luma prediction block and chroma
prediction blocks of the corresponding PU. Hence, encoder 20 may signal
information about a transform tree node at the particular depth (and
descendant transform tree nodes of the transform tree node at the
particular depth) by signaling information in the corresponding PU. For
example, video encoder 20 may signal, in a bitstream, a first syntax
element for a PU, a second syntax element for the PU, and a third syntax
element for the PU. In this example, the first syntax element for the PU
indicates whether there exists a nonzero transform coefficient in a luma
coefficient block of the corresponding transform tree node or descendant
transform tree node thereof, the second syntax element for the PU
indicates whether there exists a nonzero transform coefficient in a Cb
coefficient block of the corresponding transform tree node or descendant
transform tree node thereof, and the third syntax element for the PU
indicates whether there exists a nonzero transform coefficient in a Cr
coefficient block of the corresponding transform tree node or descendant
transform tree node thereof.
[0213] In prior techniques, rectangular PUs were only permitted for inter
predicted CUs. However, in some examples of the second technique of this
disclosure, when the rectangular PUs (such as 2N.times.N, N.times.2N) are
introduced to the intra coded CUs, the above method (i.e., requiring the
transform size to be equal to the PU size at a particular transform
depth) is also applied.
[0214] In accordance with a third technique, one TU may be split into
multiple smaller TUs while the sizes of the smaller TUs may be different.
In other words, a video coder may split a TU into two differentlysized
child TUs. In some instances, splitting a TU into two or more
differentlysized child TUs may improve video coding performance
instances where AMP is enabled because splitting a TU into two or more
differentlysized child TUs may better align the boundaries of the child
TUs with PU boundaries. As discussed elsewhere in this disclosure,
aligning boundaries of TUs with boundaries of PUs may reduce the
occurrence of high frequency transform coefficients associated with
discontinuities at boundaries between predictive blocks and therefore
increase compression efficiency. For example, if a block (e.g., a CU) has
a 12.times.16 PU, a portion of the block corresponding to the 12.times.16
PU may be split into two 8.times.8 TUs plus two 4.times.4 TUs, or two
8.times.8 TUs plus one 4.times.16 TU.
[0215] In one example of the third technique, when AMP mode is enabled for
one CU, the transform tree for the larger PU may be split to two parts
with one equal to the smaller PU and the rest as another TU. An example
is given in FIG. 22. FIG. 22 is a conceptual diagram illustrating a
transform structure for partition size equal to N.times.N/4(U), in
accordance with a technique of this disclosure. In the example of FIG.
22, a transform block 150 is partitioned into transform blocks 152 and
154. Furthermore, in the example of FIG. 22, transform block 152 is
partitioned into transform blocks 156, 158, 160, and 162. Transform block
154 is partitioned into transform blocks 164 and 166. The right branch at
transform depth 2 shows that the two split transform sizes are different.
That is, transform block 164 and transform block 166 have different
sizes.
[0216] In some examples of the third technique, asymmetric splitting of a
TU is only applicable to the AMP case wherein the two PU sizes are
different or one CU contains multiple PUs with at least two of the PUs
having different sizes. In other words, a video coder may only split a TU
into child TUs of different sizes if a CU containing the TU is split into
PUs of different sizes.
[0217] Thus, in an example where a TU may be split into multiple
differentlysized TUs, video encoder 20 may partition a CU into TUs of
the CU based on a tree structure. In this example, a root node of the
tree structure corresponds to a coding block of the CU. Furthermore, in
this example, each respective nonroot node of the tree structure
corresponds to a respective block that is a partition of a block that
corresponds to a parent node of the respective nonroot node. In this
example, leaf nodes of the tree structure correspond to the TUs of the
CUs. In this example, child nodes of at least one node of the tree
structure correspond to blocks of different sizes. Furthermore, in this
example, video encoder 20 may include, in a bitstream that comprises an
encoded representation of video data, data representing one or more of
the TUs of the CU.
[0218] In a corresponding example, video decoder 30 may determine a CU is
partitioned into TUs of the CU based on a tree structure. In this
example, a root node of the tree structure corresponds to a coding block
of the CU. Furthermore, in this example, each respective nonroot node of
the tree structure corresponds to a respective block that is a partition
of a block that corresponds to a parent node of the respective nonroot
node. In this example, leaf nodes of the tree structure correspond to the
TUs of the CUs. In this example, child nodes of at least one node of the
tree structure correspond to blocks of different sizes. Furthermore, in
this example, video decoder 30 may reconstruct, based on data for at
least one of the TUs of the CU, the coding block of the CU.
[0219] In accordance with a fourth technique of this disclosure, it is
allowed that the split of transform is carried out not along the same
direction (vertical or horizontal) in a CU. In other words, a transform
tree for a CU may include transform blocks that are split horizontally
and transform blocks that are split vertically. Allowing both horizontal
and vertical splitting of transform blocks may better align the
boundaries of the TUs of the CU with boundaries of the PUs of the CU. As
discussed elsewhere in this disclosure, aligning the boundaries of the
TUs of a CU with boundaries of the PUs of the CU may reduce the
occurrence of high frequency transform coefficients associated with
discontinuities at boundaries between predictive blocks and therefore
increase compression efficiency. In one example of the fourth technique,
the use of both horizontal and vertical splitting of TUs of a CU is only
applicable to certain partition modes, e.g., AMP.
[0220] FIG. 23 is a conceptual diagram illustrating a transform structure
for a partition size equal to N.times.N/4(U), in accordance with a
technique of this disclosure. In the example of FIG. 23, the CU partition
is along the horizontal direction and TU partition could be from either
horizontal and/or vertical directions. Particularly, a TU 180 is split
horizontally into a TU 182 and a TU 184. TU 182 is split into TUs 186,
188, 190, and 192. TU 184 is split horizontally and vertically into TUs
194, 196, and 198.
[0221] In an example in which splitting of transform blocks along
different directions in a CU is allowed, video encoder 20 may partition a
CU into TUs of the CU based on a tree structure. In this example, a root
node of the tree structure corresponds to a coding block of the CU.
Furthermore, in this example, each respective nonroot node of the tree
structure corresponds to a respective block that is a partition of a
block that corresponds to a parent node of the respective nonroot node.
In this example, leaf nodes of the tree structure correspond to the TUs
of the CUs. In this example, a first node in the tree structure has
exactly two child nodes and a boundary between blocks corresponding to
the child nodes of the first node is vertical. Additionally, in this
example, a second node in the tree structure has exactly two child nodes
and a boundary between blocks corresponding to the child nodes of the
second node is horizontal. In this example, video encoder 20 may include,
in a bitstream that comprises an encoded representation of the video
data, data representing one or more of the TUs of the CU.
[0222] Similarly, video decoder 30 may determine a CU is partitioned into
TUs of the CU based on a tree structure. In this example, a root node of
the tree structure corresponds to a coding block of the CU, each
respective nonroot node of the tree structure corresponds to a
respective block that is a partition of a block that corresponds to a
parent node of the respective nonroot node, and leaf nodes of the tree
structure correspond to the TUs of the CUs. Furthermore, in this example,
a first node in the tree structure has exactly two child nodes and a
boundary between blocks corresponding to the child nodes of the first
node is vertical. In this example, a second node in the tree structure
has exactly two child nodes and a boundary between blocks corresponding
to the child nodes of the second node is horizontal. In this example,
video decoder 30 may reconstruct, based on data for at least one of the
TUs of the CU, the coding block of the CU.
[0223] In accordance with a fifth technique of this disclosure, one CU may
contain both intra and inter PUs which is referred to comb_mode in the
following descriptions. In some instances, use of comb_mode may increase
the accuracy of predictive blocks of a CU and therefore may ultimately
lead to increased compression performance. The accuracy of a predictive
block of a PU of a CU is a measure of differences between corresponding
samples of the predictive block of the PU and samples of a coding block
of the CU.
[0224] Thus, in accordance with the fifth technique, video encoder 20 may
perform intra prediction to obtain a first predictive block for a first
PU of a CU. Additionally, in this example, video encoder 20 may perform
inter prediction to obtain a second predictive block for a second PU of
the same CU. In this example, video encoder 20 may obtain, based on the
first predictive block and the second predictive block, residual data for
the CU. Furthermore, in this example, video encoder 20 may include, in a
bitstream comprising an encoded representation of the video data, data
representing the residual data for the CU.
[0225] Similarly, in accordance with the fifth technique, video decoder 30
may perform intra prediction to obtain a first predictive block for a
first PU of a CU. In this example, video decoder 30 may perform inter
prediction to obtain a second predictive block for a second PU of the
same CU. Furthermore, in this example, video decoder 30 may reconstruct,
based on the first predictive block and the second predictive block, a
coding block of the CU.
[0226] When one CU is coded with comb_mode, and the CU is split into two
PUs in the vertical direction, such as in an N.times.2N partitioning
mode, a video coder may determine the transform depth of a transform tree
of the CU as follows: If the left PU is an intracoded PU, transform
depth can be from 0. In other words, based on the left PU being
intracoded, the depth of the transform tree of the CU is allowed to be 0
or greater. Therefore, in instances where the depth of the transform tree
of the CU is equal to 0, the TU size could be equal to the CU size and
one TU may cover two PUs, i.e., cross PU boundaries. Otherwise (the left
PU is an intercoded PU), the transform depth is restricted to be from 1.
In other words, the depth of the transform tree of the CU may be 1 or
greater, but not equal to 0. In this example, when the left PU is an
intercoded PU, the TU size should be no larger than the PU size.
[0227] Thus, in this example, video encoder 20 may generate a bitstream
that conforms to a video coding standard. In this example, based on the
CU being split into the first PU and the second PU along a vertical
boundary and the left PU of the CU being an intracoded PU, the video
coding standard allows a TU of the CU to cover both the first and second
PUs. In a similar example, video decoder 30 may obtain a bitstream
comprising an encoded representation of the video data. In this example,
the bitstream may conform to a video coding standard that, based on the
CU being split into the first PU and the second PU along a vertical
boundary and the left PU of the CU being an intracoded PU, allows a TU
of the CU to cover both the first and second PUs. In both the example of
video encoder 20 and video decoder 30, the video coding standard may
provide a restriction requiring that, based on the CU being split into
the first PU and the second PU along a vertical boundary and the left PU
of the CU being an intercoded PU, a TU size of a TU of the CU is no
larger than a size of the first PU or the second PU.
[0228] Furthermore, when one CU is split into two PUs in the horizontal
direction, such as when the 2N.times.N partition mode is used, a video
coder may determine the transform depth used when a CU contains both
intra and inter PUs as follows: If the above PU is an intracoded PU,
transform depth can be from 0. In other words, the depth of transform
tree of the CU is 0 or more. In this example, the above PU is the upper
PU of the horizontally divided CU. In instances where the transform depth
is 0, the TU size is equal to the CU size and one TU covers two PUs.
Otherwise (i.e., the above PU is an intercoded PU), transform depth is
restricted to be from 1. In other words, the depth of the transform tree
of the CU is 1 or more, but cannot be 0. In this example, when the above
PU is an intercoded PU, the TU size should be no larger than the PU
size.
[0229] Thus, in this example, video encoder 20 may generate a bitstream
that conforms to a video coding standard. In this example, based on the
CU being split into the first PU and the second PU along a horizontal
boundary and the above PU of the CU being an intracoded PU, the video
coding standard allows a TU of the CU to cover both the first and second
PUs. Moreover, in some examples, the bitstream conforms to a video coding
standard that provides a restriction requiring that, based on the CU
being split into the first PU and the second PU along a horizontal
boundary and the above PU of the CU being an intercoded PU, a TU size of
a TU of the CU is no larger than a size of the first PU or the second PU.
[0230] In a similar example, video decoder 30 may obtain a bitstream
comprising an encoded representation of the video data. In this example,
the bitstream conforms to a video coding standard that, when the CU is
split into the first PU and the second PU along a horizontal boundary and
the above PU of the CU is an intracoded PU, allows a TU of the CU to
cover both the first and second PUs. Moreover, in some examples, video
decoder 30 obtains a bitstream conforming to a video coding standard that
provides a restriction requiring that, when the CU is split into the
first PU and the second PU along a horizontal boundary and the above PU
of the CU is an intercoded PU, a TU size of a TU of the CU is no larger
than a size of the first PU or the second PU.
[0231] In some examples, when one CU is coded with comb_mode, a
restriction is added such that a TU should not cross PU boundaries. This
restriction may reduce the encoder complexity since there is no need to
check the ratedistortion cost of the case wherein TUs could cross PU
boundaries. Thus, in this example, video encoder 20 may generate a
bitstream that conforms to a video coding standard that provides a
restriction requiring that based on the CU having an intracoded PU and
an intercoded PU, no TU of the CU crosses PU boundaries of the CU. In a
similar example, video decoder 30 may obtain a bitstream comprising an
encoded representation of the video data. In this example, the bitstream
conforms to a video coding standard that provides a restriction requiring
that based on the CU having an intracoded PU and an intercoded PU, no
TU of the CU crosses PU boundaries of the CU.
[0232] Furthermore, in some examples involving the comb_mode, it is
restricted that the comb_mode is only applied for a CU larger than (not
including) a certain size such as 8.times.8. It is noted that for smaller
blocks, increasing the bits of signaling whether comb_mode is applied to
a CU may not compensate the saved ratedistortion cost introduced by the
comb_mode. Therefore, for certain small sizes, comb_mode may be always
disabled without additional signaling. In this disclosure, a restriction
may prevent a video encoder from performing some action or generating a
bitstream in some way. For example, video encoder 20 may generate a
bitstream that conforms to a video coding standard that provides a
restriction requiring that no CU smaller than a particular size is
allowed to have both an intracoded PU and an intercoded PU. In a
similar example, video decoder 30 may obtain a bitstream comprising an
encoded representation of the video data. In this example, the bitstream
conforms to a video coding standard that provides a restriction requiring
that no CU smaller than a particular size is allowed to have both an
intracoded PU and an intercoded PU.
[0233] In some examples involving the comb_mode, one 8.times.8 CU can be
coded with comb_mode with one 8.times.4 intra and one 8.times.4 inter PU,
or one 4.times.8 Intra and one 4.times.8 Inter PU. In this example, the
corresponding 4.times.4 chroma block of this 8.times.8 CU (in 4:2:0 color
format) can be coded using only the inter prediction mode of the inter
coded luma PU, or using only the intra prediction mode of the intra coded
luma PU, or the 4.times.4 chroma block is further partitioned
correspondingly as two 4.times.2 or 2.times.4 blocks based on the luma PU
partition, and each of the two 4.times.2 or 2.times.4 is predicted by the
corresponding luma prediction mode, and a 4.times.4 residual block is
generated and a 4.times.4 transform is performed on the generated 44
residual block to avoid the introduction of a 2.times.2 transform.
Introduction of a 2.times.2 transform may unnecessarily increase
complexity.
[0234] Thus, in the example above, video encoder 20 may perform intra
prediction to obtain a first predictive block for a first PU of a CU of
the video data. Additionally, in this example, video encoder 20 may
perform inter prediction to obtain a second predictive block for a second
PU of the same CU. In this example, the size of the CU is 2N.times.2N,
the size of the first PU is 2N.times.N and the size of the second PU is
N.times.2N or the size of the first PU is N.times.2N and the size of the
second PU is 2N.times.N. Furthermore, in this example, the CU is coded
using a 4:2:0 color format. In this example, the first predictive block
for the first PU is a luma predictive block for the first PU. In this
example, the second predictive block for the second PU is a luma
predictive block for the second PU. In this example, video encoder 20
uses only inter prediction to obtain a third predictive block, the third
predictive block being a chroma predictive block of size N.times.N. In
this example, video encoder 20 obtains residual data for the CU based on
the first, second, and third predictive blocks. A similar example
substitutes, instead of video encoder 20 that obtains residual data for
the CU based on the first, second, and third predictive blocks, a video
encoder 20 that uses only intra prediction to obtain the third predictive
block instead of intra prediction.
[0235] Moreover, in a corresponding example, video decoder 30 may perform
intra prediction to obtain a first predictive block for a first PU of a
CU of the video data. In this example, video decoder 30 may perform inter
prediction to obtain a second predictive block for a second PU of the
same CU. In this example, the size of the CU is 2N.times.2N, the size of
the first PU is 2N.times.N and the size of the second PU is N.times.2N or
the size of the first PU is N.times.2N and the size of the second PU is
2N.times.N, and the CU is coded using a 4:2:0 color format. Furthermore,
in this example, the first predictive block for the first PU is a luma
predictive block for the first PU and the second predictive block for the
second PU is a luma predictive block for the second PU. In this example,
video decoder 30 uses only inter prediction to obtain a third predictive
block, the third predictive block being a chroma predictive block of size
N.times.N. Furthermore, in this example, video decoder 30 may
reconstruct, based on the first, second, and third predictive blocks, the
coding block of the CU. A similar example substitutes a different
configuration of video decoder 30 that uses only intra prediction to
obtain the third predictive block instead of intra prediction.
[0236] As mentioned above, in some examples, one 8.times.8 CU can be coded
with comb_mode with one 8.times.4 intra and one 8.times.4 inter PU, or
one 4.times.8 intra and one 4.times.8 inter PU, and the corresponding
4.times.4 chroma block of this 8.times.8 CU, can be coded using only the
inter prediction mode of the inter coded luma PU, a video coder may
partition the 4.times.4 chroma block correspondingly as two 4.times.2 or
2.times.4 blocks based on the luma PU partition, the video coder predicts
each of the two 4.times.2 or 2.times.4 by the corresponding luma
prediction mode, and the video coder generates a 4.times.4 residual block
and performs a 4.times.4 transform on the generated 4.times.4 residual
block.
[0237] Thus, in such examples, video encoder 20 may perform intra
prediction to obtain a first predictive block for a first PU of a CU of
the video data. Additionally, in this example, video encoder 20 may
perform inter prediction to obtain a second predictive block for a second
PU of the same CU. In this example, the size of the CU is 2N.times.2N,
the size of the first PU is 2N.times.N and the size of the second PU is
N.times.2N or the size of the first PU is N.times.2N and the size of the
second PU is 2N.times.N, and the CU is coded using a 4:2:0 color format.
Furthermore, the first predictive block for the first PU is a luma
predictive block for the first PU and the second predictive block for the
second PU is a luma predictive block for the second PU. In this example,
video encoder 20 may use an intra prediction mode of the first PU to
generate a chroma predictive block for the first PU. Furthermore, in this
example, video encoder 20 may use inter prediction to generate a chroma
predictive block for the second PU. In this example, video encoder 20 may
obtain residual data for the CU based on the first predictive block, the
second predictive blocks, the chroma predictive block for the first PU
and the chroma predictive block for the second PU.
[0238] In a similar example, video decoder 30 may perform intra prediction
to obtain a first predictive block for a first PU of a CU of the video
data. In this example, video decoder 30 may perform inter prediction to
obtain a second predictive block for a second PU of the same CU. In this
example, the size of the CU is 2N.times.2N, the size of the first PU is
2N.times.N and the size of the second PU is N.times.2N or the size of the
first PU is N.times.2N and the size of the second PU is 2N.times.N, and
the CU is coded using a 4:2:0 color format. Furthermore, in this example,
the first predictive block for the first PU is a luma predictive block
for the first PU and the second predictive block for the second PU is a
luma predictive block for the second PU. In this example, video decoder
30 may use an intra prediction mode of the first PU to generate a chroma
predictive block for the first PU. Video decoder 30 may use inter
prediction to generate a chroma predictive block for the second PU.
Furthermore, video decoder 30 may reconstruct, based on the first
predictive block, the second predictive blocks, the chroma predictive
block for the first PU and the chroma predictive block for the second PU,
the coding block of the CU.
[0239] Additionally, in some examples involving the comb_mode, when one CU
is coded using two or more PUs and both Inter and Intra prediction modes
are used, for intercoded PUs, the CU is treated in the same way as the
current HEVC design. That is, the reconstruction is defined as the sum of
decoded residual after possible inverse quantization/transform and the
motioncompensated prediction block using its motion information. In
addition, for intracoded PUs, a video coder uses a process involving two
predictors, i.e., the reconstruction is defined as the sum of decoded
residual after possible inverse quantization/transform and the
motioncompensated prediction block using the motion information from its
neighbor intercoded PU and the intra prediction block using the intra
prediction modes associated with the current PU.
[0240] Thus, in this example, video encoder 20 may perform intra
prediction to obtain a first predictive block for a first PU of a CU of
the video data. In this example, video encoder 20 may perform inter
prediction to obtain a second predictive block for a second PU of the
same CU. Furthermore, in this example, as part of obtaining residual data
for the CU, video encoder 20 may, for each respective sample of the
residual data corresponding to the first PU, obtain the respective sample
such that the respective sample is equal to a respective sample of a
coding block of the CU minus a predictive sample obtained using motion
information of the second PU and minus a sample of the first predictive
block. The predictive sample obtained using motion information may be a
sample of a predictive block of an interpredictive PU.
[0241] In a corresponding example, video decoder 30 may perform intra
prediction to obtain a first predictive block for a first PU of a CU of
the video data. In this example, video decoder 30 may perform inter
prediction to obtain a second predictive block for a second PU of the
same CU. Furthermore, in this example, as part of reconstructing a coding
block of the CU, video decoder 30 may, for each respective sample of the
coding block corresponding to the first PU, obtain the respective sample
such that the respective sample is equal to a sum of a respective decoded
residual sample, a predictive sample obtained using motion information of
the second PU, and a sample of the first predictive block.
[0242] Alternatively, in some examples involving comb_mode, when one CU is
coded using two or more PUs and both Inter and Intra prediction modes are
used, for intracoded PUs, the process for reconstructing coding blocks
of the CU is the same as the current HEVC design, i.e., the
reconstruction is defined as the sum of decoded residual after possible
inverse quantization/transform and the intra prediction block using its
intra prediction mode. In addition, for an intercoded PU of a CU, the
process for reconstructing portions of the coding blocks corresponding to
the intercoded PU is different from the reconstruction process in HEVC
in that two predictors are defined for the intercoded PU. Furthermore,
for each sample of a coding block of the CU that corresponds to a sample
of the intercoded PU, the sample is defined as a sum of a decoded
residual sample (e.g., after possible inverse quantization/transform) and
a sample of the motioncompensated prediction block of the intercoded PU
generated using motion information of the intercoded PU and a sample of
an intra prediction block generated using an intra prediction mode
associated with an intracoded PU that neighbors the intercoded PU.
[0243] Thus, in this example, video encoder 20 may perform intra
prediction to obtain a first predictive block for a first PU of a CU of
the video data. In this example, video encoder 20 may perform inter
prediction to obtain a second predictive block for a second PU of the
same CU. Furthermore, in this example, as part of obtaining residual data
for the CU, video encoder 20 may, for each respective sample of the
residual data corresponding to the second PU, obtain the respective
sample such that the respective sample is equal to a respective sample of
a coding block of the CU minus a predictive sample obtained using an
intra prediction mode of the first PU and minus a sample of the second
predictive block.
[0244] In a corresponding example, video decoder 30 may perform intra
prediction to obtain a first predictive block for a first PU of a CU of
the video data. In this example, video decoder 30 may perform inter
prediction to obtain a second predictive block for a second PU of the
same CU. Furthermore, in this example, as part of reconstructing a coding
block of the CU, video decoder 30 may, for each respective sample of the
coding block corresponding to the second PU, obtain the respective sample
such that the respective sample is equal to a sum of a respective decoded
residual sample, a predictive sample obtained using an intra prediction
mode of the first PU, and a sample of the second predictive block.
[0245] In one example, when allowing the two predictors, the two
prediction blocks are combined with a linear weighting function, e.g., an
average of the two. For example, a video coder such as video encoder 20
or video decoder 30 may use intra prediction to generate a first
predictive block of a PU and may use inter prediction to generate a
second predictive block for the PU. In this example, the video coder may
determine a final predictive block for the PU by determining, for each
respective sample of the final predictive block, a weighted average of
the samples of the first and second predictive blocks that correspond to
the respective sample of the final predictive block. In this example,
weights used in the weighted average may favor the intra predicted
predictive block over the inter predicted predictive block, or vice
versa. In some instances, using such a linear weighting function may lead
to a more accurate final predictive block, which may ultimately increase
compression performance. The linear weighting factors may be signaled as
side information or derived from certain coded information.
[0246] For example, for each respective sample of a coding block of the CU
that corresponds to a PU of the CU, video encoder 20 may obtain a first
predictive sample for the respective sample and a second predictive
sample for the respective sample. For instance, the first predictive
sample for the respective sample may be generated using inter prediction
and the second predictive sample for the respective sample may be
generated using intra prediction. In this example, video encoder 20 may
determine a weighted predictive sample for the respective sample by
applying the linear weighting function to the first predictive sample for
the respective sample and the second predictive sample for the respective
sample. Additionally, in this example, video encoder 20 may determine a
residual sample for the respective sample equal to a difference between
an original value of the respective sample and the weighted predictive
sample for the respective sample.
[0247] Similarly, for each respective sample of a coding block that video
decoder 30 may obtain a residual sample for the respective sample. For
instance, video decoder 30 may obtain, from a bitstream, syntax elements
indicating transform coefficients, apply inverse quantization to the
transform coefficients, and apply an inverse transform to the transform
coefficients to obtain residual samples. Furthermore, in this example,
video decoder 30 may determine a first predictive sample for the
respective sample and a second predictive sample for the respective
sample. For instance, the first predictive sample for the respective
sample may be generated using inter prediction and the second predictive
sample for the respective sample may be generated using intra prediction.
In this example, video decoder 30 may determine a weighted predictive
sample for the respective sample by applying a linear weighting function
to the first predictive sample for the respective sample and the second
predictive sample for the respective sample. Additionally, in this
example, video decoder 30 may reconstruct the respective sample as a sum
of the residual sample for the respective sample and the weighted
predictive sample for the respective sample.
[0248] The following examples indicate the usage of comb_mode when the
comb_mode is enabled for one slice, picture, or sequence (i.e., coded
video sequence). In HEVC, a CU includes a 1bit pred_mode_flag syntax
element. The pred_mode_flag syntax element of a CU equal to 0 specifies
that the CU is coded in the inter prediction mode. The pred_mode_flag
syntax element of a CU equal to 1 specifies that the CU is coded in the
intra prediction mode. In accordance with one example of this disclosure,
the 1bit pred_mode_flag of a CU is replaced by a syntax element with
three possible values. In this example, the three values correspond to
the conventional intra mode, the conventional inter mode, and comb_mode,
respectively. In this example, the conventional intra mode refers to
instances where all PUs of the CU are coded using intra prediction mode.
Furthermore, in this example, the conventional inter prediction mode
refers to instances where all PUs of the CU are coded using inter
prediction mode. In some examples, when comb_mode is enabled for a CU,
for only one PU of the CU, video encoder 20 signals a 1bit value to
indicate whether the PU is coded in intra prediction mode or inter
prediction mode. Because the CU is coded in the comb_mode, the other PU
is a different prediction mode from the PU for which the 1bit value was
signaled. In another example, comb_mode is treated as the conventional
inter mode. In this example, for each PU of a CU, an additional flag is
added to indicate the usage of intra or inter prediction mode.
[0249] In a seventh technique of this disclosure, one PU can be predicted
from both intra prediction and inter prediction and the two predictive
blocks from intra prediction and inter prediction are used to derive the
final predictive block for the PU. Deriving a final predictive block in
this way may result in a more accurate predictive block for the PU, which
may increase compression performance.
[0250] Thus, in accordance with the seventh technique, video encoder 20
may perform intra prediction to obtain a first predictive block for a PU
of a CU. Additionally, in this example, video encoder 20 may perform
inter prediction to obtain a second predictive block for the same PU of
the same CU. In this example, video encoder 20 may derive, based on the
first predictive block and the second predictive block, a final
predictive block for the PU. Furthermore, video encoder 20 may obtain,
based on the final predictive block for the PU, residual data for the CU.
For instance, video encoder 20 may generate at least a portion of the
residual data for CU by calculating differences between samples of the
final predictive block for the PU and corresponding samples of a coding
block of the CU. In this example, video encoder 20 may include, in a
bitstream comprising an encoded representation of the video data, data
representing the residual data for the CU.
[0251] In a corresponding example, video decoder 30 may perform intra
prediction to obtain a first predictive block for a PU of a CU.
Additionally, in this example, video decoder 30 may perform inter
prediction to obtain a second predictive block for the same PU of the
same CU. In this example, video decoder 30 may derive, based on the first
predictive block and the second predictive block, a final predictive
block for the PU. Furthermore, in this example, video decoder 30 may
reconstruct, based on the final predictive block for the PU, a coding
block of the CU. For instance, video decoder 30 may add the final
predictive block for the PU to residual data for the CU to reconstruct at
least a portion of a coding block of the CU.
[0252] Furthermore, in some examples of the seventh technique, a video
coder applies a linear weighting function to the two prediction blocks,
e.g., the weighting factors of the pixels located in the same relative
positions of the two prediction blocks are fixed. In some examples, the
weighting factors for different positions may be variable. Furthermore,
in some examples, the weighting factors are dependent on the intra
prediction mode. In one example, for the topleft position within a
block, if the intra prediction mode is a DC mode, the weights of the
topleft samples in inter and intra predictive blocks are equal, i.e.,
(0.5, 0.5) while if the intra prediction is a vertical prediction mode,
the weight of the topleft sample in intra predictive block may be larger
than that of the topleft sample in the inter predictive block.
[0253] In some examples of the seventh technique, the final prediction
value of one pixel at one or more positions, but not all positions, may
be copied from either the intra predicted block or the inter predicted
block, i.e., one of the two weighting factors is 0 and the other one is
1.
[0254] In some examples, the seventh technique is applied to specific
partition sizes (e.g., 2N.times.2N), and/or specific prediction modes
(MERGE/SKIP mode). Furthermore, in some examples, when the seventh
technique is applied, the intra prediction modes are restricted to be a
subset of intra prediction modes used for conventional intra prediction.
In one example, the subset is defined to only include the MPMs (most
probable modes).
[0255] As discussed above, a video coder may use a Linear Model (LM)
prediction mode to predict chroma samples of a block based on
reconstructed luma samples of the same block. Furthermore, as described
above, the LM prediction mode has not been used with nonsquare PUs. In
accordance with an eighth technique of this disclosure, a video coder may
use the LM prediction mode with nonsquare PUs. More generally, the same
techniques for applying the LM prediction mode work with nonsquare luma
and chroma blocks. Hence, discussion in this disclosure regarding the
eighth technique with respect to nonsquare PUs may apply more generally
to nonsquare luma and chroma blocks, such as the luma prediction blocks
and chroma prediction blocks of PUs. Furthermore, examples of the eighth
technique may derive the parameters used in the LM prediction mode in
several ways.
[0256] For instance, in some examples of the eighth technique, a boundary
at the longer side of a nonsquare PU is downsampled or subsampled such
that the number of pixels in the downsampled or subsampled boundary is
equal to the number of pixels in the shorter boundary. The process can be
a decimation or an interpolated sampling. In examples where video decoder
30 performs the subsampling using decimation, video decoder 30 may
remove samples at regular intervals (e.g., every other sample) to reduce
the number of samples without changing the values of the remaining
samples. In another example, video decoder 30 may perform the
subsampling using interpolation. In examples where video decoder 30
performs the subsampling using interpolation, for respective pairs of
adjacent samples, video decoder 30 may interpolate a value between the
samples of a respective pair and may include the interpolated value in
the subsampled set of samples.
[0257] Thus, in an example of the eighth technique of this disclosure,
video encoder 20 may perform a linear model prediction operation to
predict a predictive chroma block for a nonsquare PU of a CU from
downsampled or subsampled reconstructed luma samples of the PU.
Furthermore, in this example, video encoder 20 may obtain, based on the
predictive chroma block, residual data for the CU. In this example, video
encoder 20 may include, in a bitstream comprising an encoded
representation of the video data, data representing the residual data for
the CU. In a corresponding example, video decoder 30 may perform a linear
model prediction operation to predict a predictive chroma block for a
nonsquare PU of a CU of a current picture of the video data from
downsampled reconstructed luma samples of the PU. In this example, video
decoder 30 may reconstruct, based in part on the predictive chroma block,
a coding block of the CU. In either of the examples of the paragraph,
video encoder 20 or video decoder 30 may downsample or subsample luma
samples of a longer side of the nonsquare PU such that the number of
downsampled or subsampled luma samples on the longer side of the
nonsquare PU is the same as the luma samples on the shorter side of the
nonsquare PU.
[0258] In one example, when using equation (4) and equation (5) to
calculate linear model parameters, for both luma and chroma components,
the pixels of the boundary at the longer side of a nonsquare PU are
subsampled such that the number of pixels in the downsampled or
subsampled boundary are equal to the number of pixels in the shorter
boundary (i.e., min((K,L))). The subsampling process can be a decimation
or an interpolated sampling.
[0259] Thus, in this example, as part of performing an LM prediction
operation, a video coder may obtain a predictive chroma sample such that
the predictive chroma sample is equal to a first parameter multiplied by
a collocated luma sample, plus a second parameter, wherein the first
parameter is equal to:
.alpha. = I x i y i  x i y i I x
i x i  x i x i ##EQU00007##
and the second parameter is equal to:
.beta.=(.SIGMA.y.sub.i.alpha..SIGMA.x.sub.i)/I,
where I is the number of reference samples in a left and top boundary of
the nonsquare PU, x.sub.i is a downsampled or subsampled reconstructed
luma reference sample, y.sub.i is a reconstructed chroma reference
sample.
[0260] Alternatively, in some examples, pixels located at both longer and
short sides of the PU may be subsampled and the subsampling ratios may
be different. A subsampling ratio is a ratio of samples prior to
subsampling to samples after subsampling. However, it may be required
that the total number of pixels at two sides after subsampling should be
equal to 2.sup.m (wherein m is an integer, m may be different for luma
and chroma components). The value of m can be dependent on the block size
K and L.
[0261] Thus, in this example, as part of performing an LM prediction
operation, a video coder may obtain a predictive chroma sample such that
the predictive chroma sample is equal to a first parameter multiplied by
a collocated luma sample, plus a second parameter, wherein the first
parameter is equal to:
.alpha. = I x i y i  x i y i I x
i x i  x i x i ##EQU00008##
and the second parameter is equal to:
.beta.=(.SIGMA.y.sub.i.alpha..SIGMA.x.sub.i)/I,
where I is the number of reference samples in a set of reference samples,
x.sub.i is a reconstructed luma reference sample, and y.sub.i is a
reconstructed chroma reference sample. In this example, the set of
reference samples is a subsampled set of left reference samples and
above reference samples, the left reference samples being immediately
left of a left boundary of the current PU and the above reference samples
being immediately above a top boundary of the current PU.
[0262] In another example of the eighth technique, the number of pixels I
in equations (4) and (5) are adjusted based on the actual number of
pixels in the boundary. For instance, for 2N.times.N PU, I=3N. When only
left or above causal samples are available, the total involved samples
number I is equal to the length of left or above boundary. Thus, a video
coder may calculate .alpha. as:
.alpha. = 3 N x i y i  x i y i 3 N
x i x i  x i x i ##EQU00009##
[0263] Additionally, the video coder may calculate .beta. as:
.beta.=(.SIGMA.y.sub.i.alpha..SIGMA.x.sub.i)/3N.
[0264] When LM is enabled for one nonsquare chroma PU (with size equal to
K.times.L, where K is unequal to L), the parameters (i.e., a and b) can
be derived in various ways. For example, when using equation (4) and
equation (5) to calculate linear model parameters, for both luma and
chroma components, the pixels of the boundary at the shorter side of the
nonsquare PU is upsampled such that the number of pixel in the
upsampled boundary is equal to the number of pixels in the longer
boundary (i.e., max(K,L)). The upsampling process can be a duplicator or
an interpolated sampling. A duplicator upsampling process is an
upsampling process in which existing samples are duplicated to generate
new samples. An interpolated upsampling process increases the number of
samples by interpolating a value of a new sample based on two or more
existing samples.
[0265] Thus, in this example, as part of performing an LM prediction
operation, a video coder may obtain a predictive chroma sample such that
the predictive chroma sample is equal to a first parameter multiplied by
a collocated luma sample, plus a second parameter, wherein the first
parameter and second parameter are defined in equations (4) and (5). In
this example, the set of reference samples is an upsampled set of left
reference samples and above reference samples, the left reference samples
being immediately left of a left boundary of the current PU and the above
reference samples being immediately above a top boundary of the current
PU. In this example, the video coder may determine the set of reference
samples by applying an upsampling method to the left references samples
and/or the above reference samples. For instance, the upsampling method
may upsample whichever of the left reference samples or the above
reference samples corresponds to the shorter of the left boundary of the
current PU and the top boundary of the current PU, but not whichever is
longer of the left reference samples of the current PU and the above
reference samples of the current PU.
[0266] In some examples, pixels located at both longer and shorter sides
of the PU may be upsampled and the upsampling ratios may be different.
However, it may be required that the total number of pixels at two sides
after upsampling should be equal to 2.sup.m (wherein m is an integer, m
may be different for luma and chroma components). The value of m may be
dependent on the block size K and L. In other words, m is dependent on a
height and/or width of the PU. For example, a PU may be 8.times.16 and a
video coder may upsample reference samples such that there are 32
reference samples along a left side of the PU and 32 reference samples
along a top side of the PU. In this example, m is equal to 6. In another
example, a PU may be 4.times.8 and a video coder may upsample reference
samples such that there are 16 reference samples along a left side of PU
and 16 reference samples along a top side of the PU. In this example, m
is equal to 4.
[0267] Furthermore, in some examples of the eighth technique, when using
equation (4) and equation (5) to calculate LM parameters, for both luma
and chroma components, the pixels of the boundary at the shorter side of
the nonsquare PU is upsampled and the pixels of the longer boundary
(i.e., max((K,L))) is subsampled such that the number of pixel in the
upsampled shorter boundary is equal to the number of pixels in the
subsampled longer boundary. The upsampling process can be a duplicator
or an interpolated sampling. The subsampling process can be a decimation
or an interpolated sampling.
[0268] Thus, in this example, as part of performing the LM prediction
operation, a video coder may obtain a predictive chroma sample such that
the predictive chroma sample is equal to a first parameter multiplied by
a collocated luma sample, plus a second parameter, wherein the first
parameter and the second parameter are defined as in equations (4) and
(5). In this example, the set of reference samples is a union of an
upsampled set of reference samples and a subsampled set of reference
samples, the upsampled set of reference samples being an upsampled
version of whichever contains fewer samples of left reference samples and
above reference samples. In this example, the subsampled set of
reference samples is a subsampled version of whichever contains more
samples of the left reference samples and the above reference samples. In
this example, the left reference samples are immediately left of a left
boundary of the current PU and the above reference samples are
immediately above a top boundary of the current PU.
[0269] In some examples, for the examples of the eighth technique
mentioned above, after the subsampling or upsampling process, a
downsampling process (e.g., as described elsewhere in this disclosure)
only for the luma component may be further applied to cover the case that
the color format is not 4:4:4. Thus, based on a color format of the
current picture being other than 4:4:4, a video coder may subsample or
downsample luma samples of the predictive block. In some examples, the
two downsampling processes of luma samples could be merged into one.
[0270] Furthermore, in some examples of the eighth technique, different
ways of subsampling/upsampling for boundary pixels may be applied. In
one example, the subsampling/upsampling method is dependent on the PU
size (i.e., on the values of K and L). In another example, the methods
for subsampling/upsampling may be signaled in a sequence parameter set,
a picture parameter set, a slice header, or in another syntax structure.
[0271] In some examples of the eighth technique, the
upsampling/downsampling (or subsampling) is implemented in an implicit
manner. In other words, the upsampling or subsampling technique is
determined implicitly. That is, the sum value, such as
.SIGMA.x.sub.iy.sub.i, .SIGMA.x.sub.i and .SIGMA.y.sub.i in equation (4)
and equation (5) of the left side boundary or/and upper side boundary, is
multiplied or divided by a factor S. The value of S can be dependent on
the ratio of the pixel number in the left side boundary or/and upper side
boundary.
[0272] Thus, in this example, as part of performing the LM prediction
operation to predict the predictive chroma block, a video coder may
obtain a predictive chroma sample such that the predictive chroma sample
is equal to a first parameter multiplied by a collocated luma sample,
plus a second parameter, wherein the first LM parameter is equal to:
.varies. = I S x i y i  S x i S y i
I S x i x i  S x i S x i ,
##EQU00010##
where S is dependent on a ratio of a pixel number in a left boundary
or/and an upper boundary of the nonsquare PU, I is the number of
reference samples in a subset of samples in a left and top boundary of
the current PU determined according to a subsampling method, x.sub.i is
a subsampled reconstructed luma reference sample, and y.sub.i is a
reconstructed chroma reference sample. In some examples,
S=max((K,L))/min((K,L)) for a K.times.L chroma block.
[0273] As described above, an enhancement multiple transform (EMT) scheme
has been proposed that uses DSTVII, DCTVIII, DSTI and DCTV.
Furthermore, as discussed above, whether EMT applies or not is controlled
at the CU level using a flag, namely an EMT flag, for all TUs within a
CU. For each TU within an EMTenabled CU, the horizontal or vertical
transform to be used is signaled by an index to a selected transform set,
namely an EMT index.
[0274] However, controlling the EMT scheme as previouslyproposed may not
be efficient if the residual characteristics of each PU in a CU are
different. For example, controlling the EMT scheme as previouslyproposed
may not be efficient for an intracoded PU and an intercoded PU within a
CU. Hence, in accordance with a ninth technique of this disclosure, when
EMT is enabled for one slice, picture, or sequence and one CU is split
into two PUs in vertical direction (e.g., N.times.2N partition), the
signaling of an EMT flag is modified in the following way: If the left PU
is an intracoded PU, the transform depth could be 0. In other words, the
transform tree of the CU may have a depth of 0 or more. In this case, an
EMT flag may be signaled at the CU level. If the transform depth is not
0, the EMT flag may be signaled at the PU level. Hence, EMT may or may
not be enabled for each PU.
[0275] Furthermore, in accordance with the ninth technique of this
disclosure, when EMT is enabled for one slice, picture, or sequence and
when one CU is split into two PUs in a horizontal direction (e.g.,
2N.times.N partition), the signaling of the EMT flag is modified in the
following way: If the above PU is an intracoded PU, the transform depth
may be 0. In other words, the transform tree of the CU may have a depth
of 0 or more. In this case, an EMT flag may be signaled at CU level. If
the transform depth is not 0, the EMT flag may be signaled at the PU
level. That is, each PU may have EMT enabled or not.
[0276] Thus, in accordance with the ninth technique, video encoder 20 may
include, in a bitstream that comprises an encoded representation of video
data, a first syntax element. The first syntax element indicates whether
EMT is enabled for a particular CU that is partitioned into exactly two
PUs along a boundary. In this example, whether the first syntax element
is in the particular CU or a particular PU of the two PUs is dependent on
a splitting direction of the PUs of the CU. Furthermore, in this example,
based on EMT being enabled for a particular CU, for each respective TU of
the particular CU, video encoder 20 may include, in the bitstream, a
respective syntax element indicating a respective selected transform set
for the respective TU. In this example, based on EMT being enabled for
the particular CU, video encoder 20 may apply one or more transforms of
the respective selected transform set to transform coefficients of the
respective TU to obtain a respective transform block for the respective
TU in the sample domain. In this example, video encoder 20 may include,
in a bitstream that comprises an encoded representation of the video
data, data representing one or more of the TUs of the CU. In this
example, the boundary may be a horizontal boundary or the boundary may be
a vertical boundary.
[0277] In a corresponding example, video decoder 30 may obtain a first
syntax element. The first syntax element indicates whether EMT is enabled
for a particular CU that is partitioned into exactly two PUs along a
boundary. In this example, whether the first syntax element is in the
particular CU or a particular PU of the two PUs is dependent on a
splitting direction of the PUs of the CU. In this example, in response to
determining that EMT is enabled for a particular CU, for each respective
TU of the particular CU, video decoder 30 may obtain a respective syntax
element indicating a respective selected transform set for the respective
TU. Additionally, in response to determining that EMT is enabled for a
particular CU, for each respective TU of the particular CU, video decoder
30 may apply an inverse of one or more transforms of the respective
selected transform set to transform coefficients of the respective TU to
obtain a respective transform block for the respective TU in the sample
domain. In this example, video decoder 30 may reconstruct, based at least
in part on the transform blocks for the TUs of the CU, a coding block of
the CU. In this example, the boundary may be a horizontal boundary or the
boundary may be a vertical boundary.
[0278] In accordance with a tenth technique of this disclosure, several
transform tree structures may be applied for coding one slice, picture,
or sequence. For example, in one example, transform tree structures are
predefined. In some examples, for each picture, slice, largest coding
unit, CU, or PU, video encoder 20 may signal the selected transform tree
structure. Alternatively, in some examples, video decoder 30 may derive
the selected transform tree from the coded information, such as
prediction modes/partition sizes.
[0279] Thus, in accordance with the tenth technique, video encoder 20 may
partition a CU of the video data into TUs of the CU based on a particular
tree structure from among a plurality of predefined tree structures. In
this example, a root node of the tree structure corresponds to a coding
block of the CU. Furthermore, in this example, each respective nonroot
node of the tree structure corresponds to a respective block that is a
partition of a block that corresponding to a parent node of the
respective nonroot node. In this example, leaf nodes of the tree
structure correspond to the TUs of the CUs. Additionally, in this
example, video encoder 20 includes, in a bitstream that comprises an
encoded representation of the video data, data representing one or more
of the TUs of the CU. In some examples, video encoder 20 may further
include, in the bitstream, one or more syntax elements identifying the
particular tree structure. In some examples, the one or more syntax
elements that indicate the particular tree structure applicable to CUs is
in one of: a picture, slice, LCU, CU, and PU. Furthermore, in some
examples, as part of determining the CU is partitioned into the TUs,
video encoder 20 determines the particular tree structure from coded
information without explicitly signaling of the particular tree
structure. In such examples, the coded information may comprise at least
one of: prediction modes and partition sizes.
[0280] In a corresponding example, video decoder 30 may determine a CU of
the video data is partitioned into TUs of the CU based on a particular
tree structure from among a plurality of predefined tree structures. In
this example, a root node of the tree structure corresponds to a coding
block of the CU. Each respective nonroot node of the tree structure
corresponds to a respective block that is a partition of a block that
corresponding to a parent node of the respective nonroot node. In this
example, leaf nodes of the tree structure correspond to the TUs of the
CUs. Additionally, in this example, video decoder 30 may reconstruct,
based on data for at least one of the TUs of the CU, the coding block of
the CU. In some examples, video decoder 30 may obtain, from a bitstream
that comprises encoded video data, one or more syntax elements
identifying the particular tree structure. The one or more syntax
elements that indicate the particular tree structure applicable to CUs in
one of: a picture, slice, LCU, CU, and prediction unit. Furthermore, in
some examples, as part of determining the CU is partitioned into the TUs,
video decoder 30 may determine the particular tree structure from coded
information without explicit signaling of the particular tree structure.
The coded information may comprise at least one of: prediction modes and
partition sizes.
[0281] In an eleventh example of this disclosure, transforms with size
equal to 1.times.N and N.times.1 may be also applied to inter coded
blocks. For instance, in one example, such TUs are only allowed for a
specific transform depth, e.g., the highest transform depth. In some
examples, such TUs are only allowed for specific coding blocks, such as
CU size equal to 8.times.8. Furthermore, in some examples, the eleventh
technique is only applicable for specific color component, such as luma.
[0282] Thus, in accordance with the eleventh technique, video encoder 20
may determine transformdomain data (e.g., transform coefficients) by
applying a 1.times.N or N.times.1 transform to residual data of an inter
coded block. In this example, video encoder 20 may include, in a
bitstream that comprises an encoded representation of the video data,
data representing the transformdomain data. In a corresponding example,
video decoder 30 may determine sampledomain data by applying a 1.times.N
or N.times.1 transform to transform coefficients of an inter coded block.
In this example, video decoder 30 may reconstruct, based in part on the
sampledomain data, a coding block of a CU of the video data. For
example, video decoder 30 may add samples of the sampledomain data to
corresponding samples of residual data to reconstruct the coding block of
the CU. In some instances, for the above examples of the eleventh
technique involving video encoder 20 and video decoder 30, 1.times.N and
N.times.1 transforms are only allowed for a specific transform depth.
Additionally, in some instances, for the above examples of the eleventh
technique involving video encoder 20 and video decoder 30, 1.times.N and
N.times.1 transforms are only allowed for a CUs of particular sizes.
[0283] In accordance with a twelfth technique of this disclosure, the
asymmetric motion partitioning defined in HEVC for inter coded CUs is
also applied to intra coded CUs. Partitioning intra predicted CUs into
PUs asymmetrically may enable video encoder 20 to more accurately divide
the CU into regions corresponding to different objects, which may
increase compression performance. Thus, in accordance with an example of
the twelfth technique, video encoder 20 may partition an intra predicted
CU of the video data into PUs asymmetrically. In this example, video
encoder 20 may determine a respective predictive block for each
respective PU of the intra predicted CU. Furthermore, in this example,
video encoder 20 may obtain residual data based on the predictive blocks
for the PUs of the intra predicted CU and a coding block of the intra
predicted CU. Additionally, in this example, video encoder 20 may
include, in a bitstream that comprises an encoded representation of the
video data, data representing the residual data.
[0284] In a corresponding example of the twelfth technique, video decoder
30 may determine an intra predicted CU of the video data is partitioned
into PUs asymmetrically. In this example, video decoder 30 may determine
a respective predictive block for each respective PU of the intra
predicted CU. Additionally, in this example, video decoder 30 may
reconstruct, based on the predictive blocks for the PUs of the intra
predicted CU, a coding block of the intra predicted CU.
[0285] In accordance with a thirteenth technique of this disclosure, when
one intra coded CU contains multiple PUs, each PU may have its own chroma
prediction mode. In other words, a PU may have a first intra prediction
mode (i.e., a luma prediction mode) and a second intra prediction mode
(i.e., a chroma prediction mode). A video coder may use the luma
prediction mode to determine the luma predictive block of the PU and may
use the chroma prediction mode to determine the chroma predictive blocks
of the PU. Thus, in accordance with the 13.sup.th technique, video
encoder 20 may determine an intra predicted CU of the video data has at
least a first PU and a second PU. In this example, the first PU and the
second PU have different chroma prediction modes. Furthermore, in this
example, video encoder 20 may include, in a bitstream that comprises an
encoded representation of the video data, data representing residual data
based at least on predictive blocks of the first PU and the second PU and
a coding block of the CU.
[0286] In a corresponding example of the 13.sup.th technique, video
decoder 30 may determine an intra predicted CU of the video data has at
least a first PU and a second PU. In this example, the first PU and the
second PU have different chroma prediction modes. Furthermore, in this
example, video decoder 30 may reconstruct, based at least on predictive
blocks of the first PU and the second PU, a coding block of the CU.
[0287] Furthermore, in accordance with an example of the 13.sup.th
technique, the chroma intra prediction modes of the previously coded PU
may be considered for coding the following PU. This, a video coder may
determine, based at least in part on a chroma prediction mode of a PU
prior to a current PU in coding order, a chroma prediction mode of the
current PU. For instance, a video coder may use the chroma intra
prediction modes of the previously coded PU in context modeling for
chroma intra prediction mode of a current PU. Context modeling may
comprise identification of a coding context for contextadaptive entropy
coding. A coding context may indicate probabilities of a value In another
example, a video coder may add the chroma intra prediction modes of the
previously coded PU as one new candidate for the chroma intra prediction
mode list.
[0288] In some examples of the 13.sup.th technique, one flag may be
firstly coded at a CU level to indicate whether all PUs share the same
chroma intra prediction modes. Thus, in this example, video encoder 20
may include, in the bitstream, a syntax element indicating whether all
PUs of the intra predicted CU share the same chroma intra prediction
modes. Similarly, video encoder 20 may obtain, from a bitstream
comprising an encoded representation of the video data, a syntax element
indicating whether all PUs of the intra predicted CU share the same
chroma intra prediction modes.
[0289] Furthermore, in some examples of the thirteenth technique, all the
chroma PUs within one CU are restricted to follow the same transform
tree. By restricting all of the chroma PUs within one CU to follow the
same transform tree, it may be unnecessary for video encoder 20 to
include data in the bitstream indicating the structures of the different
transform trees for different chroma PUs. Thus, video encoder 20 may
generate a bitstream that conforms to a video coding standard that
restricts a video encoder from generating bitstreams in which chroma PUs
of the CU have differently structured transform trees. Similarly, video
decoder 30 may obtain a bitstream comprising an encoded representation of
the video data. In this example, the bitstream conforms to a video coding
standard that restricts a video encoder from generating bitstreams in
which chroma PUs of the CU have differently structured transform trees.
[0290] In a fourteenth example of this disclosure, when one intra coded CU
contains multiple rectangular PUs, a video coder may apply a
modedependent scan. A modedependent scan is a scanning order used to
scan transform coefficients in a 2dimensional coefficient block for a TU
into a 1dimensional coefficient vector for entropy encoding. Video
encoder 20 may select, based on which intra prediction mode is used for a
PU corresponding to the TU, a modedependent scan to use for scanning
transform coefficients of the TU from among a plurality of available
scanning orders. The PU corresponding to the TU may be coextensive with
the TU or contain the area associated with the TU. Using a modedependent
scan may better arrange the transform coefficients for CABAC. In HEVC,
modedependent scans are only allowed for 8.times.8 and 4.times.4 TUs.
[0291] Thus, in accordance with an example of the fourteenth technique,
video encoder 20 may obtain residual data based on 2dimensional
transform coefficient blocks. In this example, video encoder 20 may
obtain predictive blocks for each of a plurality of rectangular PUs of an
intra predicted CU of the video data. Furthermore, in this example, video
encoder 20 may apply a modedependent scan to arrange the 2dimensional
blocks of transform coefficients into 1dimensional arrays of transform
coefficients corresponding to TUs of the CU. In this example, video
encoder 20 may include, in a bitstream that comprises an encoded
representation of the video data, data representing the 1dimensional
arrays of transform coefficients.
[0292] In a similar example, video decoder 30 may apply a modedependent
scan to arrange a 1dimensional array of transform coefficients into
2dimensional transform coefficient blocks corresponding to TUs of an
intra predicted CU of the video data. In this example, the intra
predicted CU has multiple rectangular PUs. Furthermore, in this example,
video decoder 30 may obtain residual data based on the transform
coefficient blocks. Additionally, video decoder 30 may obtain predictive
blocks for each of the PUs. In this example, video decoder 30 may
reconstruct, based on the residual data and the predictive blocks, a
coding block of the CU.
[0293] In one example of the fourteenth technique, application of the
modedependent scan is restricted to certain TU sizes, such as 8.times.4
or 4.times.8. In some examples, the modedependent scan is restricted to
certain CU sizes, such as only 8.times.8, or 8.times.8 and 16.times.16.
Furthermore, in some examples, the rule of the mapping between intra
prediction mode and scan pattern used for TU sizes equal to 8.times.8 and
4.times.4 in HEVC may be reused. In some examples, different mapping
functions may be applied which is dependent on the rectangular TU sizes.
[0294] As described elsewhere in this disclosure, VCEGAZ07 proposed using
a 4tap intra interpolation filter to improve accuracy of directional
intra prediction relative to the 2tap intra interpolation filter used in
HEVC. However, VCEGAZ07 does not indicate how a video coder selects a
4tap intra interpolation filter for a nonsquare intra coded PU. Rather,
VCEGAZ07 specifies that a video coder uses cubic interpolation filters
for 4.times.4 and 8.times.8 blocks, and uses Gaussian interpolation
filters for 16.times.16 and larger blocks. In a fifteenth technique of
this disclosure, for a nonsquare intra coded PU with size equal to
K.times.L, when determining a 4tap filter type or a scan pattern as
described elsewhere in this disclosure with respect to fourtap intra
interpolation filters, the nonsquare intra coded PU is treated as a
transform size equal to N.times.N, wherein log 2(N*N)=((log 2(K)+log
2(L))>>1)<<1), wherein log 2 is the binary logarithm, and
>> and << are the logic right and left shift, respectively.
[0295] Thus, in an example of the fifteenth technique, video encoder 20
may determine a 4tap interpolation filter for a nonsquare intra coded
PU of a CU of the video data. Furthermore, in this example, video encoder
20 may apply the determined 4tap interpolation filter as part of
obtaining a predictive block for the nonsquare intra coded PU. For
instance, video encoder 20 may apply the 4tap filter when determining a
value of a reference sample that lies between two integerposition
reference samples (i.e., reference samples at integer coordinates
relative to a topleft sample of a picture). Additionally, in this
example, video encoder 20 may include, in a bitstream that comprises an
encoded representation of the video data, data representing residual data
based at least in part on a predictive block for the nonsquare PU and a
coding block of the CU. In this example, as part of determining the 4tap
interpolation filter, video encoder 20 may determine the 4tap
interpolation filter based on a size of a square PU, wherein the size of
the square PU is based on the height and width of the nonsquare intra
coded PU.
[0296] In a corresponding example of the fifteenth technique, video
decoder 30 may determine a 4tap interpolation filter for a nonsquare
intra coded PU of a CU of the video data. Additionally, in this example,
video decoder 30 may apply the determined 4tap interpolation filter as
part of obtaining a predictive block for the nonsquare intra coded PU.
Furthermore, video decoder 30 may reconstruct, based at least in part on
a predictive block for the nonsquare PU, a coding block of the CU. In
this example, as part of determining the 4tap interpolation filter,
video decoder 30 may determine the 4tap interpolation filter based on a
size of a square PU, wherein the size of the square PU is based on the
height and width of the nonsquare intra coded PU.
[0297] In some examples of the fifteenth technique, a new 4tap filter may
be applied for nonsquare intra coded PUs. That is, even for nonsquare
intra coded PUs, a 4tap filter may be applied and this filter may be
different from what is defined for square PUs. Furthermore, in some
examples of the fifteenth technique, a different mapping table between
the intra prediction mode and scan pattern index may be applied for
nonsquare intra coded PUs.
[0298] A K.times.L transform block is treated as a transform size equal to
N.times.N wherein log 2(N*N)=(log 2(K)+log 2(L)+1). Thus, the selection
of a scan pattern for the K.times.L transform block may be the same as
the N.times.N block.
[0299] As discussed above, transform blocks in HEVC are of size N.times.N,
where N=2.sup.m and m is an integer. Furthermore, in HEVC, a video
encoder applies a 2dimensional N.times.N transform to a transform block
to generate transform coefficients. More specifically, the video encoder
applies the 2dimensional N.times.N transform by applying an Npoint
1dimensional transform to each row of the transform block and each
column of the transform block separately. Applying the transform in this
way results in an N.times.N block of transform coefficients.
[0300] In HEVC, the video encoder may apply an Npoint 1dimensional DCT
transform to an ith row or column of samples of the transform block
w.sub.i by calculating:
w.sub.i=.SIGMA..sub.j=0.sup.N1u.sub.jc.sub.ij (18)
In the equation above, i=0, N1. Elements c.sub.ij of the DCT transform
matrix C are defined as:
c ij = A cos [ .pi. N ( j + 1 2 ) i ] N
( 19 ) ##EQU00011##
In the equation above, i,j=0, N1 and where A is equal to 1 and 2.sup.1/2
for i=0 and i>0 respectively.
[0301] In equation (19), let
A cos [ .pi. N ( j + 1 2 ) i ] ##EQU00012##
be denoted by X.sub.ij. Therefore, equation (18) can be rewritten as
w i = j = 0 N  1 u j X ij N . ##EQU00013##
Because the video encoder applies the 1dimensional DCT transform in both
the horizontal and vertical directions, the transform coefficient w.sub.i
can ultimately be rewritten as:
w i = k = 0 N  1 ( ( j = 0 N  1 u j
X ij N ) X ik N ) ( 20 ) ##EQU00014##
This can further be rewritten as:
k = 0 N  1 ( ( j = 0 N  1 u j X ij
) X ik ) N N ( 21 ) ##EQU00015##
[0302] Thus, the transform can ultimately be considered as having a
"normalization factor" of {square root over (N)} {square root over (N)}.
Because N=2.sup.m, {square root over (N)} {square root over (N)} is
also a power of 2. Hence, the value of a transform coefficient can be
implemented by a rightshift operation instead of a division operation.
As discussed elsewhere in this disclosure, using rightshift operations
instead of division operations may reduce complexity and improve coding
speed.
[0303] However, problems may arise when reusing equation (19) with a
nonsquare transform block of a TU. For a 2D transform (including both
horizontal and vertical transforms), considering a K*L transform, the
normalization factor would be ( {square root over (K)}* {square root over
(L)}). If N is defined as the value satisfying the equation log
2(N*N)=((log 2(K)+log 2(L))>>1)<<1), the ratio of utilized
normalization factor ( {square root over (N)}* {square root over (N)})
and the real normalization factor ( {square root over (K)}* {square root
over (L)}) would be 1/ {square root over (2)}. In other words, when
reusing the same normalization factor derived from an N.times.N transform
block in the quantization process, the energy (i.e., the sum of squares
of quantized transform coefficients) is changed by {square root over
(2)}.
[0304] A sixteenth technique of this disclosure may address this issue.
For instance, in the sixteenth technique of this disclosure, for a
nonsquare transform block with size equal to K.times.L, when (log
2(K)+log 2(L)) is odd, the transform and quantization process in HEVC is
kept unchanged and the nonsquare transform block is treated as a
transform block with size equal to N.times.N, wherein log 2(N*N)=((log
2(K)+log 2(L))>>1)<<1). In other words, based on (log
2(K)+log 2(L)) being odd, video encoder 20 may determine a value N such
that log 2(N*N)=((log 2(K)+log 2(L))>>1)<<1). Video encoder
20 may then use elements of a DCT transform matrix C that are defined
according to equation (19) using the determined value of N in the
"normalization factor." Thus, video encoder 20 may continue to use a
rightshift operation for the division by the "normalization factor" in
equation (21).
[0305] Furthermore, in accordance with the sixteenth technique of this
disclosure, after the transform process and before the quantization
process, the transform coefficients are modified, multiplied by a factor
of 2. In other words, after applying the transform to the nonsquare
transform block to generate a coefficient block, video encoder 20
multiplies each transform coefficient of the coefficient block by a
factor of 2. This is because the ratio of the used normalization factor
(i.e., ( {square root over (N)}* {square root over (N)})) to the real
normalization factor (i.e., ( {square root over (K)}* {square root over
(L)})) is equal to
1 2 . ##EQU00016##
[0306] For example, let K=8 and L=4. In this example, log 2(4*4)=((log
2(8)+log 2(4))>>1)<<1), so N=4.
8 .times. 4 4 .times. 4 ##EQU00017##
is equal to
8 .times. 4 4 .times. 4 , ##EQU00018##
which is equal to
32 16 , ##EQU00019##
which is equal to 2. Note that for values of K and L where (log 2(K)+log
2(L)) is even, the ratio of the used normalization factor (i.e., (
{square root over (N)}* {square root over (N)})) to the real
normalization factor (i.e., ( {square root over (K)}* {square root over
(L)})) is equal to 1. Therefore, when (log 2(K)+log 2(L)) is even, there
may be no need for video encoder 20 to multiply the transform
coefficients by the factor of 2.
[0307] After the dequantization process, the dequantized coefficients
are further modified, divided by a factor of 2. Multiplying the
transform coefficients by 2 before quantization and dividing the
transform coefficients by 2 may preserve information that would
otherwise be lost in the quantization process. Preserving this
information may ensure more accurate reconstruction of the original
transform block.
[0308] In another example of the sixteenth technique, the transform and
quantization process in HEVC is kept unchanged and it is treated as a
transform size equal to N.times.N wherein log 2(N*N)=(log 2(K)+log
2(L)+1). After transform and before the quantization process, the
transform coefficients are modified, divided by a factor of 2. After the
dequantization process, the dequantized coefficients are further
modified, multiplied by a factor of 2.
[0309] In the examples of the sixteenth technique above, the factor of 2
may be represented by its approximation. For example, the process of (x*
2) can be approximated by (x*181)>>7, wherein >> represents a
right shift operation. The process of (x/ 2) can be approximated by (x*
2)/2, i.e., (x*181)>>8, wherein >> represents a right shift
operation.
[0310] Thus, in the example of the sixteenth technique presented above,
video encoder 20 may apply a transform to a transform block of a
nonsquare TU of a CU to generate a block of transform coefficients.
Additionally, video encoder 20 may modify the transform coefficients such
that each respective transform coefficient of a block of transform
coefficients is based on the respective dequantized transform coefficient
multiplied by an approximation of {square root over (2)}. In this
example, after modifying the transform coefficients, video encoder 20 may
apply a quantization process to the modified transform coefficients of
the nonsquare PU of the CU. Furthermore, in this example, video encoder
20 may include, in a bitstream comprising an encoded representation of
the video data, data based on the quantized transform coefficients. In
some examples, as part of applying the transform to the transform block
of the nonsquare TU, video encoder 20 may apply, to the dequantized
transform coefficients, a transform having size N.times.N, where log
2(N*N)=((log 2(K)+log 2(L))>>1)<<1).
[0311] In a corresponding example, video decoder 30 may apply a
dequantization process to transform coefficients of a nonsquare PU of a
CU of the video data. In this example, after applying the dequantization
process to the transform coefficients, video decoder 30 may modify the
dequantized transform coefficients such that each respective dequantized
transform coefficient of the dequantized transform coefficients based on
the respective dequantized transform coefficient divided by an
approximation of {square root over (2)}. In some examples, as part of
applying the inverse transform to the modified dequantized transform
coefficients comprises, video decoder 30 may apply, to the modified
dequantized transform coefficients, a transform having size N.times.N,
where log 2(N*N)=((log 2(K)+log 2(L))>>1)<<1).
[0312] In HEVC, video encoder 20 may calculate a quantized transform
coefficient (i.e., a level) using the following equation:
level = ( ( coeff .times. f QP %6 + offset Q )
>> QP 6 ) >> shift 2 ( 22 ) ##EQU00020##
where coeff is the transform coefficient, offset.sub.q is an offset
value, QP is a quantization parameter, shift 2=29MB, B is the bit
depth, M=log.sub.2(N), and
f=[f.sub.0, . . .
f.sub.5].sup.T=[26214,23302,20560,18396,16384,14564].sup.T (23)
[0313] Furthermore, in HEVC, video decoder 30 may inverse quantize a
quantized transform coefficient using the following equation:
coeff Q = ( ( level .times. ( g QP %6 << QP
6 ) ) + offset IQ ) >> shift 1 ( 24 )
##EQU00021##
[0314] In equation (24), coeff.sub.Q is the inverse quantized transform
coefficient, level is the quantized transform coefficient, offset.sub.IQ
is an offset value=1<<(M10+B), shift1=(M9+B), and g is defined as
shown in equation (25), below:
g=[f.sub.0, . . . f.sub.5].sup.T=[40,45,51,57,64,72].sup.T (25)
[0315] In accordance with a technique of this disclosure, video encoder 20
may use different quantization matrixes (i.e., versions off), depending
on whether (log.sub.2(W)+log.sub.2(H)) is odd or even. Similarly, video
decoder 30 may use different inverse quantization matrixes (i.e.,
versions of g), depending on whether (log 2(W)+log 2(H)) is odd or even.
An example of g is defined as follows:
[40,45,51,57,64,72], // when the sum is even
[7240,8145,9231,10317,11584,13032] // when the sum is odd Note that each
corresponding value of g for the case that it is even is multiplied by
181. In this example, there is no need to perform the multiplication or
division processes before or after quantization stages since the
compensation of {square root over (2)} has already been considered in g.
[0316] Furthermore, in equations (22) and (24), the value selected in
quantization matrixes f and g is selected based on the quantization
parameter QP. The selected values in quantization matrixes f and g may be
referred to herein as quantization matrix coefficients. In some examples
of this disclosure, video encoder 20 and video decoder 30 may select
quantization matrix coefficients based on the quantization parameter and
also based on whether (log 2(W)+log 2(H)) is odd or even.
[0317] As briefly described above and illustrated in FIG. 2A and FIG. 2B,
in HEVC, a video coder always processes TUs of a CU in a recursive zscan
order. Thus, as shown in FIG. 2A, data corresponding to transform
coefficients of TU "a" appear in the bitstream before data corresponding
to transform coefficients of TU "b", and so on. A seventeenth technique
of this disclosure proposes that for CUs coded with inter mode or
comb_mode or intra mode but with nonsquare partitions, the coding order
of transform coefficients is dependent on a PU coding order instead of
always using the recursive zscan. In some examples of the seventeenth
technique, all coefficients within one PU shall be coded together before
coding the coefficients in another PU. Thus, for video encoder 20, all
coefficients within one of the PUs of the CU are encoded together before
encoding coefficients of another one of the PUs of the CU. Similarly, in
this example, for video decoder 30, all coefficients within one PU of a
CU are decoded together before decoding coefficients of another one of
the PUs of the CU.
[0318] In this example, the transform coefficients of a PU are transform
coefficients of TUs whose transform blocks are within an area of a
prediction block of the PU. For example, let the coordinates of a
topleft corner of a prediction block of a PU of a 16.times.16 CU be
(0,0) relative to a topleft corner of a coding block of the CU and let
the coordinates of the bottomright corner of the prediction block of the
PU be (7,15). Furthermore, in this example, let the coordinates of a
topleft corner of a transform block of a TU of the CU be (4,0) and let
the coordinates of a bottomright corner of the transform block of the TU
be (7,15). In this example, transform coefficients of the TU are
transform coefficients of the PU. However, in this example, if the
topleft corner of the transform block of the TU is (8,0) and the
bottomright corner of the transform block of the TU is (15,7), the
transform coefficients of the TU are not transform coefficients of the
PU.
[0319] For example, with respect to FIG. 2A, suppose that CU 40 has two
PUs which partition CU 40 vertically through the center of CU 40. Thus,
the transform coefficients of a first PU of CU 40 include the transform
coefficients of TUs "a," "b," "c," "d," and "f". The transform
coefficients of a second PU of CU 40 include the transform coefficients
of TUs "e," "g," "h," "i," and "j". In this example, video encoder 20 may
include data representing transform coefficients of TU "e" following data
representing transform coefficients of TU "f". In contrast, in HEVC, the
data representing transform coefficients of TU "f" follows the data
representing transform coefficients of TU "e", regardless of the shape
and size of the PUs of CU 40. In other words, when one PU contains
multiple TUs, the recursive Zscan with depthfirst traversal is applied
for coding these coefficients within the PU. Taking FIG. 2 as an example,
if the partition size is equal to N.times.2N, the coding order may be a,
b, c, d, f, e, g, h, i, j.
[0320] Some examples of the seventeenth technique are only applicable when
the transform depth is unequal to 0, i.e., transform size no larger than
PU sizes. Note that the AMP mentioned above may include other asymmetric
partitions, not only the four cases defined in HEVC.
[0321] As mentioned briefly above, the IC design in HEVC only supports
square PUs. Prior to the present disclosure, how to derive the IC
parameters a and b for nonsquare PUs was unknown. An eighteenth
technique of this disclosure enables IC to be used with nonsquare PUs.
For instance, video encoder 20 may use IC to generate a nonsquare
predictive block of a current PU of a picture of the video data.
Additionally, video encoder 20 may generate residual data based on the
predictive block. For example, video encoder 20 may generate the residual
data such that each respective sample of the residual data is equal to a
difference between a respective sample of a coding block of the current
CU and a corresponding respective sample of the predictive block.
Furthermore, video encoder 20 may output a bitstream that includes data
based on the residual data. For example, video encoder 20 may apply a
transform to the residual data to generate a coefficient block, quantize
coefficients of the coefficient block, and include in the bitstream one
or more syntax elements representing each of the quantized coefficients.
In this example, video encoder 20 may entropy encode one or more of the
syntax elements for each quantized coefficient. In other examples, video
encoder 20 may skip application of the transform and/or quantization.
[0322] Furthermore, in accordance with one or more of the examples related
to IC provided above, video decoder 30 may use IC to generate a
nonsquare predictive block of a current PU of a current CU of a picture
of the video data. Additionally, video decoder 30 may reconstruct, based
on the predictive block, a block (e.g., a coding block) of the picture.
For example, video decoder 30 may reconstruct samples of the block by
adding samples of the predictive block to corresponding residual samples.
[0323] In the examples of the eighteenth technique, as part of using IC to
generate a nonsquare predictive block, a video coder (e.g., video
encoder 20 and/or video decoder 30) may determine a sample of the
predictive block as:
p(i,j)=a*r(i+dv.sub.x,j+dv.sub.y+b), where (i,j).epsilon.PU.sub.c,
where PU.sub.c is the current PU, (i,j) is the coordinate of pixels in
the predictive block, (dv.sub.x,dv.sub.y) is a vector (e.g., disparity
vector) of PU.sub.c. p(i,j) is a prediction of PU.sub.c, r is an
interview reference picture, a is a first IC parameter and b is a second
IC parameter. Furthermore, as part of using IC to generate the nonsquare
predictive block, the video coder may calculate the first IC parameter
as:
a = 2 N i = 0 2 N  1 Rec neig ( i )
Rec refneig ( i )  i = 0 2 N  1 Rec
neig ( i ) i = 0 2 N  1 Rec refneig ( i )
2 N i = 0 2 N  1 Rec refneig ( i )
Rec refneig ( i )  ( i = 0 2 N  1 Rec
refneig ( i ) ) 2 . ##EQU00022##
Additionally, the video coder may calculate the second IC parameter as:
b = i = 0 2 N  1 Rec neig ( i )  a i =
0 2 N  1 Rec refneig ( i ) 2 N , ##EQU00023##
In the equations above, Rec.sub.neig and Rec.sub.refneig denote a
neighboring pixel set of the current CU and a reference block
respectively, 2N denotes the pixel number in Rec.sub.neig and
Rec.sub.refneig, and the current CU has a size equal to N.times.N. Other
examples may use variations on the formulas indicated above.
[0324] Furthermore, in examples of the eighteenth technique, when IC is
enabled for one nonsquare PU with size equal to K.times.L (K is unequal
to L), the parameters could be derived in various ways. For example, when
using the equation (16) and equation (17) to calculate linear model
parameters, pixels located at both longer and shorter sides of the PU may
be subsampled with different ways, such as different subsampling
ratios. However, it may be required that the total number of pixels at
two sides together should be equal to 2.sup.m (wherein m is an integer,
and its value may be dependent on the block size). Thus, in this example,
Rec.sub.neigh is a subset of pixels immediately above the current CU and
immediately left of the current CU, Rec.sub.refneigh is a subset of
pixels immediately above the reference block and immediately left of the
reference block, and a total number of pixels in Rec.sub.neigh and
Rec.sub.refneigh is equal to 2.sup.m, where m is an integer. The
subsampling process can be a decimation or an interpolated sampling.
[0325] In another example of deriving the parameters for IC, when using
the equation (16) and equation (17) to calculate linear model parameters,
the pixels of the boundary at the shorter side of the nonsquare PU is
upsampled such that the number of pixels in the upsampled boundary is
equal to the number of pixels in the longer boundary (i.e., max((K,L))).
The upsampling process can be a duplicator or an interpolated sampling.
Thus, in this example, as part of using IC to generate a predictive
block, a video coder may generate Rec.sub.neigh such that Rec.sub.neigh
includes upsampled pixels in whichever is shorter of a left side and a
top side of the current CU. Additionally, in this example, the video
coder may generate Rec.sub.refneigh such that Rec.sub.refneigh includes
upsampled pixels in whichever is shorter of the left side and the top
side of the reference block
[0326] Alternatively, pixels located at both longer and short sides of the
PU may be upsampled and the upsampling ratios may be different. Thus,
in this example, as part of using IC to generate a predictive block, a
video coder may generate Rec.sub.neigh such that Rec.sub.neigh includes
upsampled pixels in whichever is longer of the left side and the top
side of the current CU. Additionally, the video coder may generate
Rec.sub.refneigh such that Rec.sub.refneigh includes upsampled pixels in
whichever is longer of the left side and the top side of the reference
block. However, it may be required that the total number of pixels at two
sides together should be equal to 2.sup.m (wherein m is an integer, m may
be different for luma and chroma components).
[0327] Furthermore, in some examples of deriving the parameters for IC,
different ways of subsampling/upsampling for boundary pixels may be
applied. In one example, the subsampling/upsampling method is dependent
on the PU size (i.e., on the values of K and L). Thus, a video coder may
determine, based on a size of the current PU, a subsampling method or
upsampling method to use to generate Rec.sub.neigh and Rec.sub.refneigh.
In another example, the methods for subsampling/upsampling may be
signaled in sequence parameter set, picture parameter set, and/or slice
header. Thus, in some examples, video encoder 20 may include, in a
bitstream, and video decoder 30 may obtain, from the bitstream, a syntax
element indicating a subsampling method to use to generate Rec.sub.neigh
and Rec.sub.refneigh. In some examples, video encoder 20 may include, in
a bitstream, and video decoder 30 may obtain, from the bitstream, a
syntax element indicating an upsampling method to use to generate the
upsampled pixels.
[0328] In some examples of deriving the parameters for IC, the
upsampling/downsampling (or subsampling) is implemented in an implicit
manner. For instance, the sum value in equation (16) and equation (17) of
the left side boundary or/and upper side boundary, may be multiplied or
divided by a factor S. The value of S can be dependent on the ratio of
the pixel number in the left side boundary or/and upper side boundary.
[0329] In some examples of the eighteenth technique, the same
subsampling/upsampling method shall also be applied to the boundary
pixels of the reference block (i.e., Rec.sub.refneigh). For example,
decimation may be used for subsampling both Rec.sub.neigh and
Rec.sub.refneigh.
[0330] Furthermore, in accordance with particular techniques of this
disclosure, when LM is enabled for a square PU, the luma and chroma
boundary pixels may be firstly subsampled to derive the parameters e.g.
using equations (16) and (17). The subsampling method may be predefined
or signaled in a sequence parameter set, a picture parameter set or a
slice header. The subsampling method may be dependent on prediction unit
size.
[0331] Thus, a video coder (e.g., video encoder 20 or video decoder 30)
may perform a linear model prediction operation to predict a predictive
chroma block for a current PU from subsampled reconstructed luma samples
of the PU. Additionally, the video coder may reconstruct, based in part
on the predictive chroma block, the block of the picture. As part of
performing the linear model prediction operation, the video coder may
obtain a predictive chroma sample such that the predictive chroma sample
is equal to a first parameter multiplied by a collocated luma sample,
plus a second parameter, wherein the first parameter is equal to:
.alpha. = I x i y i  x i y i I x
i x i  x i x i ##EQU00024##
and the second parameter is equal to:
.beta.=(.SIGMA.y.sub.i.alpha..SIGMA.x.sub.i)/I.
In the equations above, I is the number of reference samples in a subset
of samples in a left and top boundary of the current PU determined
according to a subsampling method, x.sub.i is a subsampled
reconstructed luma reference sample, y.sub.i is a reconstructed chroma
reference sample. In some instances of this example, video encoder 20 may
include, in a bitstream, and video decoder 30 may obtain, from the
bitstream, a syntax element indicating the subsampling method. In some
instances of this example, video encoder 20 and video decoder 30 may
determine, based on a size of the current PU, the subsampling method.
[0332] Various examples have been described. Particular examples of this
disclosure may be used separately or in combination with one another.
[0333] FIG. 24 is a block diagram illustrating an example video encoder 20
that may implement techniques of this disclosure. FIG. 24 is provided for
purposes of explanation and should not be considered limiting of the
techniques as broadly exemplified and described in this disclosure. For
purposes of explanation, this disclosure describes video encoder 20 in
the context of HEVC coding. However, the techniques of this disclosure
may be applicable to other coding standards or methods.
[0334] Video encoder 20 includes processing circuitry, and video encoder
20 is configured to perform one or more of the example techniques
described in this disclosure. Such processing circuitry may include fixed
function and/or programmable circuitry. For instance, video encoder 20
includes integrated circuitry, and the various units illustrated in FIG.
24 may be formed as hardware circuit blocks that are interconnected with
a circuit bus. These hardware circuit blocks may be separate circuit
blocks or two or more of the units may be combined into a common hardware
circuit block. The hardware circuit blocks may be formed as combinations
of electronic components that form operation blocks such as arithmetic
logic units (ALUs), elementary function units (EFUs), as well as logic
blocks such as AND, OR, NAND, NOR, XOR, XNOR, and other similar logic
blocks.
[0335] In the example of FIG. 24, video encoder 20 includes a prediction
processing unit 200, video data memory 201, a residual generation unit
202, a transform processing unit 204, a quantization unit 206, an inverse
quantization unit 208, an inverse transform processing unit 210, a
reconstruction unit 212, a filter unit 214, a decoded picture buffer 216,
and an entropy encoding unit 218. Prediction processing unit 200 includes
an interprediction processing unit 220 and an intraprediction
processing unit 226. Interprediction processing unit 220 may include a
motion estimation unit and a motion compensation unit (not shown). In
some examples, prediction processing unit 200 performs the illumination
compensation techniques of this disclosure. In some examples, prediction
processing unit 200 performs the LM techniques of this disclosure to
generate nonsquare chroma predictive blocks. Furthermore, in some
examples, prediction processing unit 200 performs the IC techniques of
this disclosure to generate nonsquare predictive blocks.
[0336] Video data memory 201 may be configured to store video data to be
encoded by the components of video encoder 20. The video data stored in
video data memory 201 may be obtained, for example, from video source 18
(FIG. 1). Decoded picture buffer 216 may be a reference picture memory
that stores reference video data for use in encoding video data by video
encoder 20, e.g., in intra or intercoding modes. Video data memory 201
and decoded picture buffer 216 may be formed by any of a variety of
memory devices, such as dynamic random access memory (DRAM), including
synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM
(RRAM), or other types of memory devices. Video data memory 201 and
decoded picture buffer 216 may be provided by the same memory device or
separate memory devices. In various examples, video data memory 201 may
be onchip with other components of video encoder 20, or offchip
relative to those components.
[0337] Video encoder 20 receives video data. Video encoder 20 may encode
each CTU in a slice of a picture of the video data. Each of the CTUs may
be associated with equallysized luma coding tree blocks (CTBs) and
corresponding CTBs of the picture. As part of encoding a CTU, prediction
processing unit 200 may perform quadtree partitioning to divide the CTBs
of the CTU into progressivelysmaller blocks. The smaller block may be
coding blocks of CUs. For example, prediction processing unit 200 may
partition a CTB associated with a CTU into four equallysized subblocks,
partition one or more of the subblocks into four equallysized
subsubblocks, and so on.
[0338] Video encoder 20 may encode CUs of a CTU to generate encoded
representations of the CUs (i.e., coded CUs). As part of encoding a CU,
prediction processing unit 200 may partition the coding blocks associated
with the CU among one or more PUs of the CU. Thus, each PU may be
associated with a luma prediction block and corresponding chroma
prediction blocks. Video encoder 20 and video decoder 30 may support PUs
having various sizes. As indicated above, the size of a CU may refer to
the size of the luma coding block of the CU and the size of a PU may
refer to the size of a luma prediction block of the PU. Assuming that the
size of a particular CU is 2N.times.2N, video encoder 20 and video
decoder 30 may support PU sizes of 2N.times.2N or N.times.N for intra
prediction, and symmetric PU sizes of 2N.times.2N, 2N.times.N,
N.times.2N, N.times.N, or similar for inter prediction. Video encoder 20
and video decoder 30 may also support asymmetric partitioning for PU
sizes of 2N.times.nU, 2N.times.nD, nL.times.2N, and nR.times.2N for inter
prediction.
[0339] Interprediction processing unit 220 may generate predictive data
for a PU by performing inter prediction on each PU of a CU. The
predictive data for the PU may include predictive blocks of the PU and
motion information for the PU. Interprediction processing unit 220 may
perform different operations for a PU of a CU depending on whether the PU
is in an I slice, a P slice, or a B slice. In an I slice, all PUs are
intra predicted. Hence, if the PU is in an I slice, interprediction
processing unit 220 does not perform inter prediction on the PU. Thus,
for blocks encoded in Imode, the predicted block is formed using spatial
prediction from previouslyencoded neighboring blocks within the same
frame. If a PU is in a P slice, interprediction processing unit 220 may
use unidirectional inter prediction to generate a predictive block of
the PU. If a PU is in a B slice, interprediction processing unit 220 may
use unidirectional or bidirectional inter prediction to generate a
predictive block of the PU.
[0340] Intraprediction processing unit 226 may generate predictive data
for a PU by performing intra prediction on the PU. The predictive data
for the PU may include predictive blocks of the PU and various syntax
elements. Intraprediction processing unit 226 may perform intra
prediction on PUs in I slices, P slices, and B slices.
[0341] To perform intra prediction on a PU, intraprediction processing
unit 226 may use multiple intra prediction modes to generate multiple
sets of predictive data for the PU. Intraprediction processing unit 226
may use samples from sample blocks of neighboring PUs to generate a
predictive block for a PU. The neighboring PUs may be above, above and to
the right, above and to the left, or to the left of the PU, assuming a
lefttoright, toptobottom encoding order for PUs, CUs, and CTUs.
Intraprediction processing unit 226 may use various numbers of intra
prediction modes, e.g., 33 directional intra prediction modes. In some
examples, the number of intra prediction modes may depend on the size of
the region associated with the PU.
[0342] Prediction processing unit 200 may select the predictive data for
PUs of a CU from among the predictive data generated by interprediction
processing unit 220 for the PUs or the predictive data generated by
intraprediction processing unit 226 for the PUs. In some examples,
prediction processing unit 200 selects the predictive data for the PUs of
the CU based on rate/distortion metrics of the sets of predictive data.
The predictive blocks of the selected predictive data may be referred to
herein as the selected predictive blocks.
[0343] Residual generation unit 202 may generate, based on the coding
blocks (e.g., luma, Cb and Cr coding blocks) for a CU and the selected
predictive blocks (e.g., predictive luma, Cb and Cr blocks) for the PUs
of the CU, residual blocks (e.g., luma, Cb and Cr residual blocks) for
the CU. For instance, residual generation unit 202 may generate the
residual blocks of the CU such that each sample in the residual blocks
has a value equal to a difference between a sample in a coding block of
the CU and a corresponding sample in a corresponding selected predictive
block of a PU of the CU.
[0344] Transform processing unit 204 may perform partitioning (e.g.,
quadtree partitioning) to partition the residual blocks associated with
a CU into transform blocks associated with TUs of the CU. Thus, a TU may
be associated with a luma transform block and two chroma transform
blocks. The sizes and positions of the luma and chroma transform blocks
of TUs of a CU may or may not be based on the sizes and positions of
prediction blocks of the PUs of the CU. A quadtree structure known as a
"residual quadtree" (RQT) may include nodes associated with each of the
regions. The TUs of a CU may correspond to leaf nodes of the RQT.
[0345] In some examples, transform processing unit 204 may perform the
techniques of this disclosure for determining a residual tree structure
that includes nodes having two (and/or 4) child nodes. For example, video
data memory 201 may receive video data and transform processing unit 204
may partition a CU of the video data into TUs of the CU based on a tree
structure. In this example, as part of partitioning the CU into TUs of
the CU based on the tree structure, transform processing unit 204 may
determine that a node in the tree structure has exactly two child nodes
in the tree structure. In some instances, transform processing unit 204
may further determine that a second node in the tree structure has
exactly four child nodes in the tree structure. For at least one of the
TUs of the CU, transform processing unit 204 may apply a transform to a
residual block for the TU to generate a block of transform coefficients
for the TU.
[0346] Transform processing unit 204 may generate transform coefficient
blocks for each TU of a CU by applying one or more transforms to the
transform blocks of the TU. Transform processing unit 204 may apply
various transforms to a transform block associated with a TU. For
example, transform processing unit 204 may apply a discrete cosine
transform (DCT), a directional transform, or a conceptually similar
transform to a transform block. In some examples, transform processing
unit 204 does not apply transforms to a transform block. In such
examples, the transform block may be treated as a transform coefficient
block. In some examples, transform processing unit 204 performs the EMT
techniques of this disclosure.
[0347] Quantization unit 206 may quantize the transform coefficients in a
coefficient block. The quantization process may reduce the bit depth
associated with some or all of the transform coefficients. For example,
an nbit transform coefficient may be rounded down to an mbit transform
coefficient during quantization, where n is greater than m. Quantization
unit 206 may quantize a coefficient block associated with a TU of a CU
based on a quantization parameter (QP) value associated with the CU.
Video encoder 20 may adjust the degree of quantization applied to the
coefficient blocks associated with a CU by adjusting the QP value
associated with the CU. Quantization may introduce loss of information;
thus quantized transform coefficients may have lower precision than the
original transform coefficients.
[0348] In some examples, quantization unit 206 modifies the transform
coefficients such that each respective transform coefficient of the block
of transform coefficients is based on the respective dequantized
transform coefficient multiplied by an approximation of {square root
over (2)}. In this example, after modifying the transform coefficients,
quantization unit 206 applies a quantization process to the modified
transform coefficients of the nonsquare PU of the CU.
[0349] Inverse quantization unit 208 and inverse transform processing unit
210 may apply inverse quantization and inverse transforms to a
coefficient block, respectively, to reconstruct a residual block from the
coefficient block. Reconstruction unit 212 may add the reconstructed
residual block to corresponding samples from one or more predictive
blocks generated by prediction processing unit 200 to produce a
reconstructed transform block associated with a TU. By reconstructing
transform blocks for each TU of a CU in this way, video encoder 20 may
reconstruct the coding blocks of the CU.
[0350] Filter unit 214 may perform one or more deblocking operations to
reduce blocking artifacts in the coding blocks associated with a CU.
Decoded picture buffer 216 may store the reconstructed coding blocks
after filter unit 214 performs the one or more deblocking operations on
the reconstructed coding blocks. Interprediction processing unit 220 may
use a reference picture that contains the reconstructed coding blocks to
perform inter prediction on PUs of other pictures. In addition,
intraprediction processing unit 226 may use reconstructed coding blocks
in decoded picture buffer 216 to perform intra prediction on other PUs in
the same picture as the CU.
[0351] Entropy encoding unit 218 may receive data from other functional
components of video encoder 20. For example, entropy encoding unit 218
may receive coefficient blocks from quantization unit 206 and may receive
syntax elements from prediction processing unit 200. Entropy encoding
unit 218 may perform one or more entropy encoding operations on the data
to generate entropyencoded data. For example, entropy encoding unit 218
may perform a CABAC operation, a contextadaptive variable length coding
(CAVLC) operation, a variabletovariable (V2V) length coding operation,
a syntaxbased contextadaptive binary arithmetic coding (SBAC)
operation, a Probability Interval Partitioning Entropy (PIPE) coding
operation, an ExponentialGolomb encoding operation, or another type of
entropy encoding operation on the data. Video encoder 20 may output a
bitstream that includes entropyencoded data generated by entropy
encoding unit 218. For instance, the bitstream may include data that
represents a RQT for a CU.
[0352] FIG. 25 is a block diagram illustrating an example video decoder 30
that is configured to implement techniques of this disclosure. FIG. 25 is
provided for purposes of explanation and is not limiting on the
techniques as broadly exemplified and described in this disclosure. For
purposes of explanation, this disclosure describes video decoder 30 in
the context of HEVC coding. However, the techniques of this disclosure
may be applicable to other coding standards or methods.
[0353] Video decoder 30 includes processing circuitry, and video decoder
30 is configured to perform one or more of the example techniques
described in this disclosure. For instance, video decoder 30 includes
integrated circuitry, and the various units illustrated in FIG. 25 may be
formed as hardware circuit blocks that are interconnected with a circuit
bus. These hardware circuit blocks may be separate circuit blocks or two
or more of the units may be combined into a common hardware circuit
block. The hardware circuit blocks may be formed as a combination of
electronic components that form operation blocks such as arithmetic logic
units (ALUs), elementary function units (EFUs), as well as logic blocks
such as AND, OR, NAND, NOR, XOR, XNOR, and other similar logic blocks.
[0354] In some examples, one or more of the units illustrated in FIG. 25
may be provided by software units executing on the processing circuitry.
In such examples, the object code for these software units is stored in
memory. An operating system may cause video decoder 30 to retrieve the
object code and execute the object code, which causes video decoder 30 to
perform operations to implement the example techniques. In some examples,
the software units may be firmware that video decoder 30 executes at
startup. Accordingly, video decoder 30 is a structural component having
hardware that performs the example techniques or has software/firmware
executing on the hardware to specialize the hardware to perform the
example techniques.
[0355] In the example of FIG. 25, video decoder 30 includes an entropy
decoding unit 250, video data memory 251, a prediction processing unit
252, an inverse quantization unit 254, an inverse transform processing
unit 256, a reconstruction unit 258, a filter unit 260, and a decoded
picture buffer 262. Prediction processing unit 252 includes a motion
compensation unit 264 and an intraprediction processing unit 266. In
other examples, video decoder 30 may include more, fewer, or different
functional components. In some examples, prediction processing unit 266
performs the illumination compensation techniques of this disclosure. In
some examples, prediction processing unit 266 performs the LM techniques
of this disclosure.
[0356] Video data memory 251 may store encoded video data, such as an
encoded video bitstream, to be decoded by the components of video decoder
30. The video data stored in video data memory 251 may be obtained, for
example, from computerreadable medium 16, e.g., from a local video
source, such as a camera, via wired or wireless network communication of
video data, or by accessing physical data storage media. Video data
memory 251 may form a coded picture buffer (CPB) that stores encoded
video data from an encoded video bitstream. Decoded picture buffer 262
may be a reference picture memory that stores reference video data for
use in decoding video data by video decoder 30, e.g., in intra or
intercoding modes, or for output. Video data memory 251 and decoded
picture buffer 262 may be formed by any of a variety of memory devices,
such as dynamic random access memory (DRAM), including synchronous DRAM
(SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other
types of memory devices. Video data memory 251 and decoded picture buffer
262 may be provided by the same memory device or separate memory devices.
In various examples, video data memory 251 may be onchip with other
components of video decoder 30, or offchip relative to those components.
[0357] Video data memory 251 receives and stores encoded video data (e.g.,
NAL units) of a bitstream. Entropy decoding unit 250 may receive encoded
video data (e.g., NAL units) from video data memory 251 and may parse the
NAL units to obtain syntax elements. Entropy decoding unit 250 may
entropy decode entropyencoded syntax elements in the NAL units.
Prediction processing unit 252, inverse quantization unit 254, inverse
transform processing unit 256, reconstruction unit 258, and filter unit
260 may generate decoded video data based on the syntax elements
extracted from the bitstream. Entropy decoding unit 250 may perform a
process generally reciprocal to that of entropy encoding unit 218.
[0358] In addition to obtaining syntax elements from the bitstream, video
decoder 30 may perform a reconstruction operation on a nonpartitioned
CU. To perform the reconstruction operation on a CU, video decoder 30 may
perform a reconstruction operation on each TU of the CU. By performing
the reconstruction operation for each TU of the CU, video decoder 30 may
reconstruct residual blocks of the CU.
[0359] As part of performing a reconstruction operation on a TU of a CU,
inverse quantization unit 254 may inverse quantize, i.e., dequantize,
coefficient blocks associated with the TU. After inverse quantization
unit 254 inverse quantizes a coefficient block, inverse transform
processing unit 256 may apply one or more inverse transforms to the
coefficient block in order to generate a residual block associated with
the TU. For example, inverse transform processing unit 256 may apply an
inverse DCT, an inverse integer transform, an inverse KarhunenLoeve
transform (KLT), an inverse rotational transform, an inverse directional
transform, or another inverse transform to the coefficient block. In some
examples, inverse transform processing unit 256 performs the EMT
techniques of this disclosure.
[0360] In accordance with some examples of this disclosure, inverse
quantization unit 254 may apply a dequantization process to transform
coefficients of a nonsquare TU of a CU of the video data. Furthermore,
after applying the dequantization process to the transform coefficients,
inverse quantization unit 254 may modify the dequantized transform
coefficients such that each respective dequantized transform coefficient
of the dequantized transform coefficients is based on the respective
dequantized transform coefficient divided by an approximation of {square
root over (2)}.
[0361] In some examples, inverse transform processing unit 256 may apply
the techniques of this disclosure for determining a residual tree
structure that includes nodes having two (and/or 4) child nodes. For
example, inverse transform processing unit 256 may determine a CU of the
video data is partitioned into TUs of the CU based on a tree structure.
In this example, as part of determining the CU is partitioned into the
TUs of the CU based on the tree structure, inverse transform processing
unit 256 may determine that a node in the tree structure has exactly two
child nodes in the tree structure. In some examples, inverse transform
processing unit 256 may determine that a second node in the tree
structure has exactly four child nodes in the tree structure.
Furthermore, in this example, for at least one of the TUs of the CU,
inverse transform processing unit 256 may apply a transform to a
coefficient block for the TU to generate a residual block for the TU.
[0362] If a PU is encoded using intra prediction, intraprediction
processing unit 266 may perform intra prediction to generate predictive
blocks of the PU. Intraprediction processing unit 266 may use an intra
prediction mode to generate the predictive blocks of the PU based on
samples spatiallyneighboring blocks. Intraprediction processing unit
266 may determine the intra prediction mode for the PU based on one or
more syntax elements obtained from the bitstream.
[0363] If a PU is encoded using inter prediction, entropy decoding unit
250 may determine motion information for the PU. Motion compensation unit
264 may determine, based on the motion information of the PU, one or more
reference blocks. Motion compensation unit 264 may generate, based on the
one or more reference blocks, predictive blocks (e.g., predictive luma,
Cb and Cr blocks) for the PU.
[0364] Reconstruction unit 258 may use transform blocks (e.g., luma, Cb
and Cr transform blocks) for TUs of a CU and the predictive blocks (e.g.,
luma, Cb and Cr blocks) of the PUs of the CU, i.e., either
intraprediction data or interprediction data, as applicable, to
reconstruct the coding blocks (e.g., luma, Cb and Cr coding blocks) for
the CU. For example, reconstruction unit 258 may add samples of the
transform blocks (e.g., luma, Cb and Cr transform blocks) to
corresponding samples of the predictive blocks (e.g., luma, Cb and Cr
predictive blocks) to reconstruct the coding blocks (e.g., luma, Cb and
Cr coding blocks) of the CU.
[0365] Filter unit 260 may apply one or more filters to coding blocks of
the CU. For example, filter unit 260 may perform a deblocking operation
to reduce blocking artifacts associated with the coding blocks of the CU.
Video decoder 30 may store the coding blocks of the CU in decoded picture
buffer 262. Thus, decoded picture buffer 262 may store decoded blocks of
the video data. Decoded picture buffer 262 may provide reference pictures
for subsequent motion compensation, intra prediction, and presentation on
a display device, such as display device 32 of FIG. 1. For instance,
video decoder 30 may perform, based on the blocks in decoded picture
buffer 262, intra prediction or inter prediction operations for PUs of
other CUs.
[0366] FIG. 26 is a block diagram illustrating an example video encoder 20
that supports LMbased encoding in accordance with a technique of this
disclosure. In the example of FIG. 26, component of video encoder 20
operates in the same manner as the corresponding components of video
encoder 20 of FIG. 24. However, video encoder 20 of FIG. 26 also includes
an LMbased encoding unit 222.
[0367] LMbased encoding unit 222 may perform the LM prediction encoding
according to the examples described elsewhere in this disclosure. For
example, inverse quantization unit 208, inverse transform processing unit
210, reconstruction unit 212, and filter unit 214 may reconstruct a set
of luma reference samples, a set of chroma reference samples, and may
also reconstruct luma samples of a nonsquare PU. LMbased encoding unit
222 may downsample or subsample the set of luma reference samples such
that a total number of luma reference samples in the set of luma
reference samples that neighbor a longer side of the nonsquare PU is the
same as a total number of luma reference samples of the set of luma
reference samples that neighbor a shorter side of the nonsquare PU.
Additionally, LMbased decoding unit 222 may determine a first parameter
such that the first parameter is equal to:
(.SIGMA.y.sub.i.alpha..SIGMA.x.sub.i)/I
where I is a total number of reference samples in the set of the luma
reference samples, x.sub.i is an ith luma reference sample in the set of
luma reference samples, and y.sub.i is an ith chroma reference sample in
the set of chroma reference samples. For each respective chroma sample of
a predictive chroma block of the nonsquare PU, LMbased encoding unit
222 may determine a value of the respective chroma sample such that the
value of the respective chroma sample is equal to a second parameter
multiplied by a respective reconstructed luma sample corresponding to the
respective chroma sample, plus the first parameter, the reconstructed
luma sample corresponding to the respective chroma sample being one of
the reconstructed luma samples of the nonsquare PU. LMbased encoding
unit 222 may determine the first parameter such that the second parameter
is equal to:
I x i y i  x i y i I x i x i
 x i x i . ##EQU00025##
[0368] LMbased encoding unit 222 may output the predictive block to
residual generation unit 202. Residual generation unit 202 generates a
residual block from the predictive block and the chroma block. The
resulting residual block is transformed by transform processing unit 103,
quantized by quantization unit 206, and entropy encoded by entropy
encoding unit 218. The result is then signaled via a bitstream and video
decoder 30 may use information in the bitstream to reconstruct the chroma
block.
[0369] FIG. 27 is a block diagram illustrating an example video decoder 30
that supports LMbased decoding in accordance with a technique of this
disclosure. In the example of FIG. 27, components of video decoder 30
operate in the same manner as the corresponding components of video
decoder 30 of FIG. 27. However, video decoder 30 of FIG. 27 also includes
an LMbased decoding unit 265.
[0370] In accordance with various examples of this disclosure, video
decoder 30 may be configured to perform LMbased coding in accordance
with examples provided elsewhere in this disclosure. For example, inverse
quantization unit 254, inverse transform processing unit 256,
reconstruction unit 258, and filter unit 260 may reconstruct a set of
luma reference samples, a set of chroma reference samples, and may also
reconstruct luma samples of a nonsquare PU. LMbased decoding unit 265
may downsample or subsample the set of luma reference samples such that
a total number of luma reference samples in the set of luma reference
samples that neighbor a longer side of the nonsquare PU is the same as a
total number of luma reference samples of the set of luma reference
samples that neighbor a shorter side of the nonsquare PU. Additionally,
LMbased decoding unit 265 may determine a first parameter such that the
first parameter is equal to:
(.SIGMA.y.sub.i.alpha..SIGMA.x.sub.i)/I
where I is a total number of reference samples in the set of the luma
reference samples, x.sub.i is an ith luma reference sample in the set of
luma reference samples, and y.sub.i is an ith chroma reference sample in
the set of chroma reference samples. For each respective chroma sample of
a predictive chroma block of the nonsquare PU, LMbased decoding unit
266 may determine a value of the respective chroma sample such that the
value of the respective chroma sample is equal to a second parameter
multiplied by a respective reconstructed luma sample corresponding to the
respective chroma sample, plus the first parameter, the reconstructed
luma sample corresponding to the respective chroma sample being one of
the reconstructed luma samples of the nonsquare PU. LMbased decoding
unit 266 may determine the first parameter such that the second parameter
is equal to:
I x i y i  x i y i I x i x i
 x i x i . ##EQU00026##
[0371] LMbased decoding unit 265 may output the predictive block to
reconstruction unit 258. Reconstruction unit 258 also receives a residual
block (e.g., after information in the bitstream for the residual block is
entropy decoded with entropy decoding unit 250, inverse quantized with
inverse quantization unit 254, inverse transformed with inverse transform
processing unit 256). Reconstruction unit 258 adds the residual block
with the predictive block to reconstruct the chroma block.
[0372] FIG. 28 is a flowchart illustrating an example operation of video
encoder 20 in accordance with a LMbased coding technique of this
disclosure. The flowcharts of this disclosure are provided as examples.
Other examples within the scope of this disclosure may include more,
fewer, or different actions. Other examples within the scope of this
disclosure may include actions in different orders or performed in
parallel.
[0373] In the example of FIG. 28, video encoder 20 may receive video data
(300). For example, video encoder 20 may receive the video data from
video source 18 (FIG. 1), or another source.
[0374] Additionally, in the example of FIG. 28, video encoder 20 may
reconstruct a set of luma reference samples and a set of chroma reference
samples (302). The set of luma reference samples may comprise above luma
samples neighboring a top side of a nonsquare luma block of a current
picture of the video data and left luma samples neighboring a left side
of the nonsquare luma block. In some examples, the nonsquare luma block
is a luma prediction block of a nonsquare PU. The set of chroma
reference samples may comprise chroma samples neighboring the top side of
a nonsquare chroma block and chroma samples neighboring the left side of
the nonsquare chroma block. In some examples, the nonsquare luma block
is a luma prediction block of a nonsquare PU.
[0375] Furthermore, video encoder 20 may reconstruct luma samples of the
nonsquare luma block (304). For example, video encoder 20 may generate
luma residual samples for the CU as described elsewhere in this
disclosure. In this example, video encoder 20 may add samples of a luma
predictive block of the nonsquare luma block to corresponding samples of
the luma residual samples to reconstruct the luma samples of the
nonsquare luma block.
[0376] In some examples, video encoder 20 may downsample or subsample
the luma samples of the nonsquare luma block. By downsampling or
subsampling the luma samples of the nonsquare luma block, video encoder
20 may obtain a downsampled or subsampled set of luma samples having
one luma sample for each chroma sample of a chroma predictive block
(e.g., a chroma predictive block of the same PU as the luma block). Video
encoder 20 may downsample or subsample the luma samples of the
nonsquare luma block in response to determining that a color format of
the current picture is not 4:4:4.
[0377] Additionally, video encoder 20 may downsample or subsample the
set of luma reference samples such that a total number of luma reference
samples in the set of luma reference samples that neighbor a longer side
of the nonsquare luma block is the same as a total number of luma
reference samples of the set of luma reference samples that neighbor a
shorter side of the nonsquare luma block (306). Video encoder 20 may
downsample or subsample the set of luma reference samples in accordance
with the techniques described elsewhere in this disclosure. For instance,
video encoder 20 may decimate the set of luma reference samples such that
the set of luma reference samples that neighbor the longer side of the
nonsquare luma block is the same as the total number of luma reference
samples of the set of luma reference samples that neighbor the shorter
side of the nonsquare luma block. In some examples, video encoder 20 may
downsample or subsample whichever of the left reference samples or the
above reference samples corresponds to the longer of the left boundary of
the luma and the top boundary of the luma, but not whichever is shorter
of the left boundary of the luma block and the top boundary of the luma
block. In some examples, a total number of reference samples in the set
of luma reference samples is equal to 2.sup.m, where m is an integer
dependent on a height and/or width of the nonsquare luma block.
[0378] In some examples, video encoder 20 may also downsample or
subsample the set of chroma reference samples such that a total number
of chroma reference samples in the set of chroma reference samples that
neighbor a longer side of the nonsquare chroma block is the same as a
total number of chroma reference samples of the set of chroma reference
samples that neighbor a shorter side of the nonsquare chroma block.
[0379] In action (308) of FIG. 28, video encoder 20 may determine a first
parameter (.beta.) such that the first parameter is based on:
(.SIGMA.y.sub.i.alpha..SIGMA.x.sub.i)/I
In the equation above, I is a total number of reference samples in the
set of the luma reference samples, x.sub.i is an ith luma reference
sample in the set of luma reference samples, and y.sub.i is an ith
chroma reference sample in the set of chroma reference samples. Video
encoder 20 may determine the first parameter based on the formula above
in the sense that video encoder 20 uses the formula above directly or a
variation on the formula above, such as one that includes additional
constants or coefficients.
[0380] In some examples, video encoder 20 may also determine a second
parameter (a) such that the second parameter is based on:
I x i y i  x i y i I x i x i
 x i x i ##EQU00027##
Video encoder 20 may determine the second parameter based on the formula
above in the sense that video encoder 20 uses the formula above directly
or a variation on the formula above, such as one that includes additional
constants or coefficients.
[0381] Additionally, in example of FIG. 28, for each respective chroma
sample of a predictive chroma block, video encoder 20 may determine a
value of the respective chroma sample such that the value of the
respective chroma sample is equal to a second parameter multiplied by a
respective reconstructed luma sample corresponding to the respective
chroma sample, plus the first parameter (310). The reconstructed luma
sample corresponding to the respective chroma sample is one of the
reconstructed luma samples of the nonsquare luma block.
[0382] Furthermore, video encoder 20 may obtain, based on the predictive
chroma block, residual data (312). For example, video encoder 20 may
determine values of chroma samples of the residual data equal to
differences between samples of the chroma block of the nonsquare
prediction block and samples of a chroma coding block of a CU.
[0383] Additionally, video encoder 20 may include, in a bitstream
comprising an encoded representation of the video data, data representing
the residual data (314). For example, video encoder 20 may apply one or
more transforms to the residual data to generate one or more coefficient
blocks; quantize the coefficient blocks; generate syntax elements
indicating whether a transform coefficient is nonzero, whether the
transform coefficient is greater than 1, whether the transform
coefficient is greater than 2, a sign of the transform coefficient, and a
remainder for the transform coefficient. In this example, video encoder
20 may apply CABAC coding to one or more of these syntax elements and
include the resulting values in the bitstream.
[0384] FIG. 29 is a flowchart illustrating an example operation of video
decoder 30 in accordance with a LMbased coding technique of this
disclosure. In the example of FIG. 29, video decoder 30 receives a
bitstream that comprises an encoded representation of the video data
(350).
[0385] Furthermore, in the example of FIG. 29, video decoder 30
reconstructs a set of luma reference samples and a set of chroma
reference samples (352). The set of luma reference samples comprises
above luma samples neighboring a top side of a nonsquare luma block of a
current picture of the video data and left luma samples neighboring a
left side of the nonsquare luma block. In some examples, the nonsquare
luma block is a luma prediction block of a nonsquare PU. The set of
chroma reference samples comprises chroma samples neighboring a top side
of a nonsquare chroma block and chroma samples neighboring a left side
of the nonsquare chroma block. In some examples, the nonsquare luma
block is a luma prediction block of a nonsquare PU.
[0386] Video decoder 30 may reconstruct luma samples of the nonsquare
luma block (354). For example, as part of reconstructing luma samples of
the nonsquare luma block, video decoder 30 may use intra prediction or
inter prediction to generate a luma predictive block for the nonsquare
luma block. Additionally, in this example, video decoder 30 may add
samples of the luma predictive block for the nonsquare luma block to
corresponding residual samples to reconstruct luma samples.
[0387] In some examples, video decoder 30 may downsample or subsample
the luma samples of the nonsquare luma block. By downsampling or
subsampling the luma samples of the nonsquare luma block, video decoder
30 may obtain a downsampled or subsampled set of luma samples having
one luma sample for each chroma sample of a chroma predictive block
(e.g., a chroma predictive block of the same PU as the luma block). Video
decoder 30 may downsample or subsample the luma samples of the
nonsquare luma block in response to determining that a color format of
the current picture is not 4:4:4.
[0388] Furthermore, in the example of FIG. 29, video decoder 30 may
downsample or subsample the set of luma reference samples such that a
total number of luma reference samples in the set of luma reference
samples that neighbor a longer side of the nonsquare luma block is the
same as a total number of luma reference samples of the set of luma
reference samples that neighbor a shorter side of the nonsquare luma
block (356). Video decoder 30 may downsample or subsample the set of
luma reference samples in accordance with the techniques described
elsewhere in this disclosure. For instance, video decoder 30 may decimate
the set of luma reference samples such that the set of luma reference
samples that neighbors the longer side of the nonsquare luma block is
the same as the total number of luma reference samples of the set of luma
reference samples that neighbor the shorter side of the nonsquare luma
block. In some examples, video decoder 30 may downsample or subsample
whichever of the left reference samples or the above reference samples
corresponds to the longer of the left boundary of the luma block and the
top boundary of the luma block, but not whichever is shorter of the left
boundary of the luma block and the top boundary of the luma block. In
some examples, a total number of reference samples in the set of luma
reference samples is equal to 2.sup.m, where m is an integer dependent on
a height and/or width of the nonsquare luma block.
[0389] In some examples, video decoder 30 may also downsample or
subsample the set of chroma reference samples such that a total number
of chroma reference samples in the set of chroma reference samples that
neighbor a longer side of the nonsquare chroma block is the same as a
total number of chroma reference samples of the set of chroma reference
samples that neighbor a shorter side of the nonsquare chroma block.
[0390] Additionally, in action (358) of FIG. 29, video decoder 30 may
determine a first parameter (.beta.) such that the first parameter is
based on:
(.SIGMA.y.sub.i.alpha..SIGMA.x.sub.i)/I
In the equation above, I is a total number of reference samples in the
set of the luma reference samples, x.sub.i is an ith luma reference
sample in the set of luma reference samples, and y.sub.i is an ith
chroma reference sample in the set of chroma reference samples. In this
disclosure, video encoder 20 and/or video decoder 30 may determine a
value based on a formula in the sense that video encoder 20 and/or video
decoder 30 may uses the formula directly or a variation on the formula,
such as one that includes additional constants or coefficients.
[0391] In some examples, video decoder 30 may also determine a second
parameter (a) such that the second parameter is based on:
I x i y i  x i y i I x i x i
 x i x i . ##EQU00028##
[0392] In the example of FIG. 29, for each respective chroma sample of a
predictive chroma block, video decoder 30 may determine a value of the
respective chroma sample such that the value of the respective chroma
sample is equal to a second parameter multiplied by a respective
reconstructed luma sample corresponding to the respective chroma sample,
plus the first parameter (360). In this example, the reconstructed luma
sample corresponding to the respective chroma sample is one of the
reconstructed luma samples of the nonsquare luma block.
[0393] Furthermore, video decoder 30 may reconstruct, based in part on the
predictive chroma block, a coding block (362). For example, video decoder
30 may add samples of the predictive chroma block to corresponding
residual chroma samples of a CU to determine samples of a coding block of
the CU. FIG. 30 is a flowchart illustrating an example operation of video
encoder 20, in accordance with a quantization technique of this
disclosure. In the example of FIG. 30, video encoder 20 receives the
video data (400). Furthermore, video encoder 20 may generate a residual
block for a nonsquare TU of a CU such that each residual sample of the
residual block indicates a difference between corresponding samples of a
coding block of the CU and a predictive block of a PU of the CU (402).
[0394] Video encoder 20 may apply a transform to the residual block to
generate a block of transform coefficients (404). For example, video
encoder 20 may apply a DCT transform to the residual block. In addition,
video encoder 20 may modify the transform coefficients such that each
respective transform coefficient of the block of transform coefficients
is based on the respective dequantized transform coefficient multiplied
by an approximation of {square root over (2)} (406). For example, video
encoder 20 may modify the transform coefficients such that each
respective transform coefficient is equal to an original value of the
respective transform coefficient multiplied by the approximation of
{square root over (2)}. In this disclosure, an approximation of {square
root over (2)} may be a representation of {square root over (2)} (e.g.,
a floating point representation of {square root over (2)}). In some
examples, modifying the transform coefficients such that each respective
transform coefficient is equal to an original value of the respective
transform coefficient multiplied by the approximation of {square root
over (2)} may comprise performing one or more mathematical operations to
determine values approximating transform coefficient multiplied by
{square root over (2)}.
[0395] In some examples where the nonsquare TU has the size K.times.L, as
part of applying the transform to the residual block, video encoder 20
may apply, to the residual block, a transform having size N.times.N,
where log.sub.2(N*N)=((log.sub.2(K)+log.sub.2(L))>>1)<<1) and
((log.sub.2(K)+log.sub.2(L)) is odd. For instance, video encoder 20 may
apply Npoint 1dimensional DCT transforms to rows and columns of the
residual block as shown in equation (18), above.
[0396] Furthermore, in the example of FIG. 30, after modifying the
transform coefficients, video encoder 20 may apply a quantization process
to the modified transform coefficients of the nonsquare prediction block
of the CU (408). For instance, video encoder 20 may quantize the modified
transform coefficients as described in equation (22), above.
[0397] Video encoder 20 may include, in a bitstream comprising an encoded
representation of the video data, data based on the quantized transform
coefficients (410). For example, video encoder 20 may generate syntax
elements indicating whether a quantized transform coefficient is
nonzero, whether the quantized transform coefficient is greater than 1,
whether the quantized transform coefficient is greater than 2, a sign of
the quantized transform coefficient, and a remainder for the quantized
transform coefficient. In this example, video encoder 20 may apply CABAC
coding to one or more of these syntax elements and include the resulting
values in the bitstream.
[0398] In some examples where the nonsquare TU has the size K.times.L,
video encoder 20 may modify, based on the ((log.sub.2(K)+log.sub.2(L))
being odd, the dequantized transform coefficients such that each
respective dequantized transform coefficient of the dequantized transform
coefficients is based on the respective dequantized transform coefficient
multiplied by the approximation of {square root over (2)}. In such
examples, when ((log.sub.2(K)+log.sub.2(L)) is even, video encoder 20
does not modify the dequantized transform coefficients such that each
respective dequantized transform coefficient of the dequantized transform
coefficients is based on the respective dequantized transform coefficient
multiplied by the approximation of {square root over (2)}.
[0399] FIG. 31 is a flowchart illustrating an example operation of video
decoder 30, in accordance with a quantization technique of this
disclosure. In the example of FIG. 31, video decoder 30 may receive a
bitstream that comprises an encoded representation of the video data
(450). Furthermore, video decoder 30 may apply a dequantization process
to transform coefficients of a nonsquare TU of a CU of the video data
(452). For instance, video decoder 30 may dequantize (i.e., inverse
quantize) the transform coefficients by applying equation (24), above.
[0400] After applying the dequantization process to the transform
coefficients, video decoder 30 may modify the dequantized transform
coefficients such that each respective dequantized transform coefficient
of the dequantized transform coefficients is based on the respective
dequantized transform coefficient divided by an approximation of {square
root over (2)} (454). For instance, video decoder 30 may determine each
respective modified transform coefficient is equal to the transform
coefficient divided an approximation of {square root over (2)}. In this
disclosure, an approximation of {square root over (2)} may be a
representation of {square root over (2)} (e.g., a floating point
representation of {square root over (2)}). In some examples, modifying
the transform coefficients such that each respective transform
coefficient is equal to an original value of the respective transform
coefficient multiplied by the approximation of {square root over (2)}
may comprise performing one or more mathematical operations to determine
values approximating transform coefficient divided by {square root over
(2)}.
[0401] Furthermore, video decoder 30 may apply an inverse transform to the
modified dequantized transform coefficients to reconstruct a residual
block (456). For example, video decoder 30 may apply equation (18) with a
transpose of the transform matrix C (or its approximation represented in
integer precision) to apply the inverse transform to the modified
dequantized transform coefficients. In some examples where the nonsquare
TU has the size K.times.L, as part of applying the transform to the
residual block, video decoder 30 may apply, to the residual block, a
transform having size N.times.N, where
log.sub.2(N*N)=((log.sub.2(K)+log.sub.2(L))>>1)<<1) and
((log.sub.2(K)+log.sub.2(L)) is odd. Video decoder 30 may reconstruct
samples of a coding block by adding samples of a predictive block to
corresponding samples of the residual block for the TU of the CU (458).
[0402] In some examples where the nonsquare TU has the size K.times.L,
video decoder 30 may modify, based on the ((log.sub.2(K)+log.sub.2(L))
being odd, the dequantized transform coefficients such that each
respective dequantized transform coefficient of the dequantized transform
coefficients is based on the respective dequantized transform coefficient
divided by the approximation of {square root over (2)}. In such
examples, when ((log.sub.2(K)+log.sub.2(L)) is even, video decoder 30
does not modify the dequantized transform coefficients such that each
respective dequantized transform coefficient of the dequantized transform
coefficients is based on the respective dequantized transform coefficient
divided by the approximation of {square root over (2)}.
[0403] FIG. 32 is a flowchart illustrating an example operation of video
encoder 20, in accordance with a technique of this disclosure that uses
IC. In the example of FIG. 32, video encoder 20 receives the video data
(500). For example, video encoder 20 may receive the video data from
video source 18 (FIG. 1), or elsewhere. Furthermore, video encoder 20 may
use IC to generate a nonsquare predictive block of a current PU of a
current CU of a current picture of the video data (502)
[0404] As part of using IC to generate the nonsquare predictive block,
video encoder 20 may determine, based on a vector of the current PU, a
reference block in an reference picture (504). In some examples, the
vector is a disparity vector and the reference picture is an interview
reference picture. In some examples, the vector is a motion vector and
the reference picture is a temporal motion vector. The reference block
and the nonsquare predictive block may be the same size and shape. In
some examples, to determine the reference block based on the vector of
the current PU, video encoder 20 may determine a position in the
reference picture of a topleft corner of the reference block by adding a
horizontal component of the vector to an x coordinate of a topleft
corner of the nonsquare predictive block and adding a vertical component
of the vector to a y coordinate of the topleft corner of the nonsquare
predictive block. In this example, if the indicated position of the
topleft corner of the reference block does not indicate a position in
the reference picture of an integer pixel, video encoder 20 may
interpolate samples of the reference block to determine the reference
block.
[0405] Furthermore, in the example of FIG. 32, as part of using IC to
generate the nonsquare predictive block, video encoder 20 may subsample
a first set of reference samples to generate a first subsampled set of
reference samples with a first subsampling ratio (506). In this example,
a total number of reference samples in the first set of reference samples
is not equal to 2.sup.m and a total number of reference samples in the
first subsampled set of reference samples is equal to 2.sup.m.
Furthermore, in this example, the first set of reference samples
comprises samples outside the nonsquare predictive block along a left
side and a top side of the nonsquare predictive block, and m is an
integer.
[0406] Additionally, as part of using IC to generate the nonsquare
predictive block, video encoder 20 may subsample a second set of
reference samples to generate a second subsampled set of reference
samples with a second subsampling ratio (508). The first subsampling
ratio may be the same or different from the second subsampling ratio. In
this example, a total number of reference samples in the second set of
reference samples is not equal to 2.sup.m and a total number of reference
samples in the second subsampled set of reference samples is equal to
2.sup.m. Furthermore, in this example, the second set of reference
samples comprises samples outside the reference block along a left side
and a top side of the reference block.
[0407] In actions (506) and (508), video encoder 20 may perform the
subsampling in various ways. For example, video encoder 20 may perform
the subsampling using decimation. In examples where video encoder 20
performs the subsampling using decimation, video encoder 20 may remove
samples at regular intervals (e.g., every other sample) to reduce the
number of samples without changing the values of the remaining samples.
Thus, in this example, video encoder 20 may perform at least one of:
decimating the first set of reference samples to generate the first
subsampled set of reference samples; and decimating the second set of
reference samples to generate the second subsampled set of reference
samples.
[0408] In another example, video encoder 20 may perform the subsampling
using interpolation. In examples where video encoder 20 performs the
subsampling using interpolation, for respective pairs of adjacent
samples, video encoder 20 may interpolate a value between the samples of
a respective pair and may include the interpolated value in the
subsampled set of samples. Thus, in this example, video encoder 20 may
perform at least one of: performing interpolated sampling of the first
set of reference samples to generate the first subsampled set of
reference samples; and performing interpolated sampling of the second set
of reference samples to generate the second subsampled set of reference
samples.
[0409] In another example, video encoder 20 may perform the subsampling
using a subsampling method indicated by a syntax element in the
bitstream. Thus, in this example, video encoder 20 may include, in the
bitstream, a syntax element indicating a subsampling method. In this
example, video encoder 20 may perform at least one of: using the
indicated subsampling method to subsample the first set of reference
samples to generate the first subsampled set of reference samples; and
using the indicated subsampling method to subsample the second set of
reference samples to generate the second subsampled set of reference
samples.
[0410] In another example, video encoder 20 may determine, based on a size
of the current PU, a subsampling method. In this example, video encoder
20 may perform at least one of: using the determined subsampling method
to subsample the first set of reference samples to generate the first
subsampled set of reference samples; and using the determined
subsampling method to subsample the second set of reference samples to
generate the second subsampled set of reference samples.
[0411] As part of using IC to generate the nonsquare predictive block, in
action (510) of FIG. 32, video encoder 20 may determine a first IC
parameter based on the total number of reference samples in the first
subsampled set of reference samples and the total number of reference
samples in the second subsampled set of reference samples, the first
subsampled set of reference samples, and the second subsampled set of
reference samples. For instance, video encoder 20 may determine the first
IC parameter such that the first IC parameter is based on:
i = 0 2 N  1 Rec neigh ( i )  a i = 0
2 N  1 Rec refneigh ( i ) 2 N ##EQU00029##
In the equation above, 2N denotes the total number of reference samples
in the first subsampled set of reference samples and the total number of
reference samples in the second subsampled set of reference samples,
Rec.sub.neigh(i) denotes an ith reference sample in the first
subsampled set of reference samples, and Rec.sub.refneigh(i) denotes an
ith reference sample in the second subsampled set of reference samples.
[0412] In some examples, video encoder 20 may determine a second IC
parameter such that the second IC parameter is based on:
2 N i = 0 2 N  1 Rec neigh ( i )
Rec refneigh ( i )  i = 0 2 N  1 Rec
neigh ( i ) i = 0 2 N  1 Rec refneigh ( i )
2 N i = 0 2 N  1 Rec refneigh ( i )
Rec neigh ( i )  ( i = 0 2 N  1
Rec refneigh ( i ) ) 2 . ##EQU00030##
[0413] Furthermore, as part of using IC to generate the nonsquare
predictive block, in action (512) of FIG. 32, video encoder 20 may
determine a sample of the nonsquare predictive block based on the first
IC parameter. For instance, the sample may be at coordinates (i,j)
relative to a topleft corner of the current picture and video encoder 20
may determine the sample such that the sample is based on:
a*r(i+dv.sub.x,j+dv.sub.y+b)
In the equation above, b is the first IC parameter, a is a second IC
parameter, r is the reference picture, dv.sub.x is a horizontal component
of the vector (e.g., disparity vector, motion vector) of the current PU,
and dv.sub.y is a vertical component of the vector of the current PU.
[0414] In the example of FIG. 32, video encoder 20 may generate residual
data based on the nonsquare predictive block (514). For example, video
encoder 20 may generate the residual data such that samples of the
residual data are equal to differences between samples of the nonsquare
predictive block and samples of a coding block of the current CU.
Additionally, video encoder 20 may output a bitstream that includes data
based on the residual data (516). For example, video encoder 20 may
output a bitstream that includes entropy encoded syntax elements (e.g.,
syntax elements indicating greater than 1, greater than 2, remainder,
etc.) that indicate the residual data.
[0415] FIG. 33 is a flowchart illustrating an example operation of video
decoder 30 for encoding video data, in accordance with a technique of
this disclosure that uses IC. In the example of FIG. 32, video decoder 30
receives a bitstream that comprises an encoded representation of the
video data (550). Furthermore, video decoder 30 uses IC to generate a
nonsquare predictive block of a current PU of a current CU of a current
picture of the video data (552).
[0416] As part of using IC to generate the nonsquare predictive block,
video decoder 30 may determine, based on a vector of the current PU, a
reference block in a reference picture (554). In some examples, the
vector is a disparity vector and the reference picture is an interview
reference picture. In some examples, the vector is a motion vector and
the reference picture is a temporal motion vector. The reference block
and the nonsquare predictive block being the same size and shape. To
determine the reference block based on the disparity vector of the
current PU, video decoder 30 may determine a position in the reference
picture of a topleft corner of the reference block by adding a
horizontal component of the vector to an x coordinate of a topleft
corner of the nonsquare predictive block and adding a vertical component
of the vector to a y coordinate of the topleft corner of the nonsquare
predictive block. In this example, if indicated position of the topleft
corner of the reference block does not indicate a position in the
reference picture of an integer pixel, video decoder 30 may interpolate
samples of the reference block to determine the reference block.
[0417] Furthermore, as part of using IC to generate the nonsquare
predictive block, video decoder 30 may subsample a first set of
reference samples to generate a first subsampled set of reference
samples with a first subsampling ratio (556). In this example, a total
number of reference samples in the first set of reference samples is not
equal to 2.sup.m and a total number of reference samples in the first
subsampled set of reference samples is equal to 2.sup.m. In this
example, the first set of reference samples may comprise samples outside
the nonsquare predictive block along a left side and a top side of the
nonsquare predictive block, and m is an integer.
[0418] Additionally, as part of using IC to generate the nonsquare
predictive block, video decoder 30 may subsample a second set of
reference samples to generate a second subsampled set of reference
samples with a second subsampling ratio (558). The first subsampling
ratio may be the same or different from the second subsampling ratio. In
this example, a total number of reference samples in the second set of is
not equal to 2.sup.m and a total number of reference samples in the
second subsampled set of reference samples is equal to 2.sup.m.
Furthermore, in this example, the second set of reference samples may
comprise samples outside the reference block along a left side and a top
side of the reference block.
[0419] In actions (556) and (558), video decoder 30 may perform the
subsampling in various ways. For example, video decoder 30 may perform
the subsampling using decimation. In examples where video decoder 30
performs the subsampling using decimation, video decoder 30 may remove
samples at regular intervals (e.g., every other sample) to reduce the
number of samples without changing the values of the remaining samples.
Thus, in this example, video decoder 30 may perform at least one of:
decimating the first set of reference samples to generate the first
subsampled set of reference samples; and decimating the second set of
reference samples to generate the second subsampled set of reference
samples.
[0420] In another example, video decoder 30 may perform the subsampling
using interpolation. In examples where video decoder 30 performs the
subsampling using interpolation, for respective pairs of adjacent
samples, video decoder 30 may interpolate a value between the samples of
a respective pair and may include the interpolated value in the
subsampled set of samples. Thus, in this example, video decoder 30 may
perform at least one of: performing interpolated sampling of the first
set of reference samples to generate the first subsampled set of
reference samples; and performing interpolated sampling of the second set
of reference samples to generate the second subsampled set of reference
samples.
[0421] In another example, video decoder 30 may perform the subsampling
using a subsampling method indicated by a syntax element in the
bitstream. Thus, in this example, video decoder 30 may obtain, from the
bitstream, a syntax element indicating a subsampling method. In this
example, video decoder 30 may perform at least one of: using the
indicated subsampling method to subsample the first set of reference
samples to generate the first subsampled set of reference samples; and
using the indicated subsampling method to subsample the second set of
reference samples to generate the second subsampled set of reference
samples.
[0422] In another example, video decoder 30 may determine, based on a size
of the current PU, a subsampling method. In this example, video decoder
30 may perform at least one of: using the determined subsampling method
to subsample the first set of reference samples to generate the first
subsampled set of reference samples; and using the determined
subsampling method to subsample the second set of reference samples to
generate the second subsampled set of reference samples.
[0423] Furthermore, in action (560) of FIG. 33, video decoder 30 may
determine a first IC parameter based on the total number of reference
samples in the first subsampled set of reference samples and the total
number of reference samples in the second subsampled set of reference
samples, the first subsampled set of reference samples, and the second
subsampled set of reference samples. For instance, video decoder 30 may
determine the first IC parameter such that the first IC parameter is
based on:
i = 0 2 N  1 Rec neigh ( i )  a i = 0
2 N  1 Rec refneigh ( i ) 2 N ##EQU00031##
In the equation above, 2N denotes the total number of reference samples
in the first subsampled set of reference samples and the total number of
reference samples in the second subsampled set of reference samples,
Rec.sub.neigh(i) denotes an ith reference sample in the first
subsampled set of reference samples, and Rec.sub.refneigh(i) denotes an
ith reference sample in the second subsampled set of reference samples.
[0424] In some examples, video decoder 30 may determine a second IC
parameter such that the second IC parameter is based on:
2 N i = 0 2 N  1 Rec neigh ( i )
Rec refneigh ( i )  i = 0 2 N  1 Rec
neigh ( i ) i = 0 2 N  1 Rec refneigh ( i )
2 N i = 0 2 N  1 Rec refneigh ( i )
Rec neigh ( i )  ( i = 0 2 N  1
Rec refneigh ( i ) ) 2 . ##EQU00032##
[0425] Additionally, in action (562) of FIG. 33, video decoder 30 may
determine a sample of the nonsquare predictive block based on the first
IC parameter. For instance, the sample may be at coordinates (i,j)
relative to a topleft corner of the current picture and video decoder 30
may determine the sample such that the sample is based on:
a*r(i+dv.sub.x,j+dv.sub.y+b)
In the equation above, b is the first IC parameter, a is a second IC
parameter, r is the reference picture, dv.sub.x is a horizontal component
of a vector of the current PU, and dv.sub.y is a vertical component of
the vector of the current PU.
[0426] Video decoder 30 may reconstruct, based on the nonsquare
predictive block, a coding block of the current CU (564). For example,
video decoder 30 may reconstruct samples of the coding block by adding
samples of the nonsquare predictive block to corresponding samples of a
residual block for a TU of the current CU.
[0427] FIG. 34 is a flowchart illustrating an example operation of video
encoder 20 for encoding video data, in accordance with a technique of
this disclosure that uses a flexible residual tree. In the example of
FIG. 34, video encoder 20 may receive video data (600). Furthermore,
video encoder 20 may partition a CU of the video data into TUs of the CU
based on a tree structure (602). In some examples, video encoder 20 may
determine, for each respective node of the tree structure, a value of a
splitting indicator for the respective node. The splitting indicator for
a respective node may indicate how many child nodes the respective node
has. In some instances, video encoder 20 may signal, in the bitstream, a
syntax element explicitly indicating the value of the splitting indicator
of the respective node. In other instances, video decoder 30 may infer
the value of the splitting indicator for the respective node (e.g., based
on a depth of the node in the tree structure, values of splitting nodes
of parent nodes, sizes and/or shapes of predictive blocks corresponding
to the respective node, and so on).
[0428] As part of partitioning the CU into TUs of the CU based on the tree
structure, video encoder 20 may determine that a node in the tree
structure has exactly two child nodes in the tree structure (604). In
this example, a root node of the tree structure corresponds to a coding
block of the CU. Each respective nonroot node of the tree structure
corresponds to a respective block that is a partition of a block that
corresponds to a parent node of the respective nonroot node. Leaf nodes
of the tree structure correspond to the TUs of the CUs.
[0429] For example, video encoder 20 may determine, based on a total
number of PUs of the CU, whether the tree structure is a binary tree or a
quarter tree. In this example, based on the CU having two PUs, the node
has exactly two child nodes in the tree structure. In other words, video
encoder 20 may determine, based on the CU having exactly two PUs, that
the node has exactly two child nodes in the tree structure.
[0430] In some examples, video encoder 20 may determine, based on the CU
having exactly two PUs, that the node has exactly two child nodes in the
tree structure.
[0431] Furthermore, in the example of FIG. 34, for at least one of the TUs
of the CU, video encoder 20 applies a transform to a residual block for
the TU to generate a block of transform coefficients for the TU (606).
For example, video encoder 20 may apply a discrete cosine transform
(DCT), discrete sine transform (DST), or another type of transform to the
residual block for the TU to generate the block of transform
coefficients. Additionally, video encoder 20 may entropy encode syntax
elements indicating the transform coefficients for the TU (608). For
example, video encoder 20 may generate syntax elements indicating whether
a transform coefficient is nonzero, whether the transform coefficient is
greater than 1, whether the transform coefficient is greater than 2, a
sign of the transform coefficient, and a remainder for the transform
coefficient. In this example, video encoder 20 may apply CABAC coding to
one or more of these syntax elements.
[0432] FIG. 35 is a flowchart illustrating an example operation of video
decoder 30 for decoding video data, in accordance with a technique of
this disclosure that uses a flexible residual tree. In the example of
FIG. 35, video decoder 30 may receive a bitstream that comprises an
encoded representation of the video data (650). Additionally, video
decoder 30 may determine a CU of the video data is partitioned into TUs
of the CU based on a tree structure (652). In some examples, video
decoder 30 may determine, for each respective node of the tree structure,
a value of a splitting indicator for the respective node. The splitting
indicator for a respective node may indicate how many child nodes the
respective node has. In some instances, video decoder 30 may obtain from
the bitstream a syntax element explicitly indicating the value of the
splitting indicator of the respective node. In other instances, video
decoder 30 may infer the value of the splitting indicator for the
respective node (e.g., based on a depth of the node in the tree
structure, values of splitting nodes of parent nodes, sizes and/or shapes
of predictive blocks corresponding to the respective node, and so on).
[0433] As part of determining the CU is partitioned into the TUs of the CU
based on the tree structure, video decoder 30 may determine that a node
in the tree structure has exactly two child nodes in the tree structure
(654). In this example, a root node of the tree structure corresponds to
a coding block of the CU. Each respective nonroot node of the tree
structure corresponds to a respective block that is a partition of a
block that corresponds to a parent node of the respective nonroot node.
Leaf nodes of the tree structure correspond to the TUs of the CU. As
described elsewhere in this disclosure, video decoder 30 may determine
that a node in the tree structure has exactly two child nodes based on a
number of PUs in the CU, based on a depth of the node in the tree
structure, based on a signaled syntax element, or based on other data.
[0434] For example, video decoder 30 may determine, based on a total
number of PUs of the CU, whether the tree structure is a binary tree or a
quarter tree. In this example, based on the CU having two PUs, the node
has exactly two child nodes in the tree structure. In other words, video
decoder 30 may determine, based on the CU having exactly two PUs, that
the node has exactly two child nodes in the tree structure.
[0435] For at least one of the TUs of the CU, video decoder 30 may apply a
transform to a coefficient block for the TU to generate a residual block
for the TU (656). For example, video decoder 30 may apply an inverse DCT,
an inverse DST, or another type of transform to the coefficient block for
the TU to generate the residual block for the TU. Additionally, video
decoder 30 may reconstruct samples of a coding block by adding samples of
a predictive block to corresponding samples of the residual block for the
TU of the CU (658).
[0436] Certain aspects of this disclosure have been described with respect
to extensions of the HEVC standard for purposes of illustration. However,
the techniques described in this disclosure may be useful for other video
coding processes, including other standard or proprietary video coding
processes not yet developed.
[0437] A video coder, as described in this disclosure, may refer to a
video encoder or a video decoder. Similarly, a video coding unit may
refer to a video encoder or a video decoder. Likewise, video coding may
refer to video encoding or video decoding, as applicable.
[0438] It is to be recognized that depending on the example, certain acts
or events of any of the techniques described herein can be performed in a
different sequence, may be added, merged, or left out altogether (e.g.,
not all described acts or events are necessary for the practice of the
techniques). Moreover, in certain examples, acts or events may be
performed concurrently, e.g., through multithreaded processing,
interrupt processing, or multiple processors, rather than sequentially.
[0439] In one or more examples, the functions described may be implemented
in hardware, software, firmware, or any combination thereof. If
implemented in software, the functions may be stored on or transmitted
over as one or more instructions or code on a computerreadable medium
and executed by a hardwarebased processing unit. Computerreadable media
may include computerreadable storage media, which corresponds to a
tangible medium such as data storage media, or communication media
including any medium that facilitates transfer of a computer program from
one place to another, e.g., according to a communication protocol. In
this manner, computerreadable media generally may correspond to (1)
tangible computerreadable storage media which is nontransitory or (2) a
communication medium such as a signal or carrier wave. Data storage media
may be any available media that can be accessed by one or more computers
or one or more processors to retrieve instructions, code and/or data
structures for implementation of the techniques described in this
disclosure. A computer program product may include a computerreadable
medium.
[0440] By way of example, and not limitation, such computerreadable
storage media can comprise RAM, ROM, EEPROM, CDROM or other optical disk
storage, magnetic disk storage, or other magnetic storage devices, flash
memory, or any other medium that can be used to store desired program
code in the form of instructions or data structures and that can be
accessed by a computer. Also, any connection is properly termed a
computerreadable medium. For example, if instructions are transmitted
from a website, server, or other remote source using a coaxial cable,
fiber optic cable, twisted pair, digital subscriber line (DSL), or
wireless technologies such as infrared, radio, and microwave, then the
coaxial cable, fiber optic cable, twisted pair, DSL, or wireless
technologies such as infrared, radio, and microwave are included in the
definition of medium. It should be understood, however, that
computerreadable storage media and data storage media do not include
connections, carrier waves, signals, or other transitory media, but are
instead directed to nontransitory, tangible storage media. Disk and
disc, as used herein, includes compact disc (CD), laser disc, optical
disc, digital versatile disc (DVD), floppy disk and Bluray disc, where
disks usually reproduce data magnetically, while discs reproduce data
optically with lasers. Combinations of the above should also be included
within the scope of computerreadable media.
[0441] Instructions may be executed by one or more processors, such as one
or more digital signal processors (DSPs), general purpose
microprocessors, application specific integrated circuits (ASICs), field
programmable logic arrays (FPGAs), or other equivalent integrated or
discrete logic circuitry. Accordingly, the term "processor," as used
herein may refer to any of the foregoing structure or any other structure
suitable for implementation of the techniques described herein. In
addition, in some aspects, the functionality described herein may be
provided within dedicated hardware and/or software modules configured for
encoding and decoding, or incorporated in a combined codec. Also, the
techniques could be fully implemented in one or more circuits or logic
elements.
[0442] The techniques of this disclosure may be implemented in a wide
variety of devices or apparatuses, including a wireless handset, an
integrated circuit (IC) or a set of ICs (e.g., a chip set). Various
components, modules, or units are described in this disclosure to
emphasize functional aspects of devices configured to perform the
disclosed techniques, but do not necessarily require realization by
different hardware units. Rather, as described above, various units may
be combined in a codec hardware unit or provided by a collection of
interoperative hardware units, including one or more processors as
described above, in conjunction with suitable software and/or firmware.
[0443] Various examples have been described. These and other examples are
within the scope of the following claims.
* * * * *