Difference between revisions of "BlockGraphTransforms"

From WikiBiron
m
 
(6 intermediate revisions by one other user not shown)
Line 3: Line 3:
 
In this work a new set of edge-adaptive transforms (EATs) is
 
In this work a new set of edge-adaptive transforms (EATs) is
 
presented as an alternative to the standard DCTs used in image and video coding applications. These transforms avoid
 
presented as an alternative to the standard DCTs used in image and video coding applications. These transforms avoid
?ltering across edges in each image block, thus, they avoid
+
filtering across edges in each image block, thus, they avoid
creating large high frequency coef?cients. These transforms
+
creating large high frequency coefficients. These transforms
 
are then combined with the DCT in H.264/AVC and a transform mode selection algorithm is used to choose between
 
are then combined with the DCT in H.264/AVC and a transform mode selection algorithm is used to choose between
 
DCT and EAT in an RD-optimized manner. These transforms
 
DCT and EAT in an RD-optimized manner. These transforms
 
are applied to coding depth maps used for view synthesis in a
 
are applied to coding depth maps used for view synthesis in a
 
multi-view video coding system, and provides up to 29% bit
 
multi-view video coding system, and provides up to 29% bit
rate reduction for a ?xed quality in the synthesized views.
+
rate reduction for a fixed quality in the synthesized views.
 
</p>
 
</p>
 
<P><B>Edge adaptive transform (EAT) design</B></P>
 
<P><B>Edge adaptive transform (EAT) design</B></P>
Line 22: Line 22:
 
can be easily applied to original pixel values.
 
can be easily applied to original pixel values.
 
</p>
 
</p>
[[http://biron.usc.edu/~kumarsun/PCS10_fig1.pdf]Figure 1]
+
[[http://biron.usc.edu/~kumarsun/PCS10_fig1.pdf] Figure 1]
  
 
<li>'''EAT construction'''
 
<li>'''EAT construction'''
 
<ul>
 
<ul>
First edge detection is applied on the residual block to ?nd
+
First edge detection is applied on the residual block to find
 
edge locations and a binary edge map is generated that indicates the locations of edges. If no edges are found in a residual block, then a DCT is used and no further EAT processing
 
edge locations and a binary edge map is generated that indicates the locations of edges. If no edges are found in a residual block, then a DCT is used and no further EAT processing
 
is performed. Otherwise, the encoder computes an EAT. A
 
is performed. Otherwise, the encoder computes an EAT. A
Line 34: Line 34:
 
them. From this graph, we derive the [http://en.wikipedia.org/wiki/Adjacency_matrix adjacency matrix], [http://en.wikipedia.org/wiki/Degree_matrix diagonal degree matrix], and the [http://en.wikipedia.org/wiki/Laplacian_matrix Laplacian matrix]. Projecting the signal of a graph
 
them. From this graph, we derive the [http://en.wikipedia.org/wiki/Adjacency_matrix adjacency matrix], [http://en.wikipedia.org/wiki/Degree_matrix diagonal degree matrix], and the [http://en.wikipedia.org/wiki/Laplacian_matrix Laplacian matrix]. Projecting the signal of a graph
 
G onto the eigenvectors of the Laplacian matrix yields a spectral decomposition of the signal, i.e., it provides a “frequency
 
G onto the eigenvectors of the Laplacian matrix yields a spectral decomposition of the signal, i.e., it provides a “frequency
domain” interpretation of the signal on the graph. EAT coef?cients are computed as the projection of the one-dimensionalized image-signal onto the eigenvectors  
+
domain” interpretation of the signal on the graph. EAT coefficients are computed as the projection of the one-dimensionalized image-signal onto the eigenvectors  
of the Laplacian matrix.  
+
of the Laplacian matrix. The coefficients are quantized with a uniform scalar quantizer. This vector of coefficients is then reformed
 +
into an N × N block by placing the coefficients in the
 +
standard zig-zag fashion used for the DCT.
 +
</ul>
 +
<br>
 +
<li> '''RD optimization'''
 +
<ul>
 +
The EATs minimize the number of non-zero coefficients
 +
that must be encoded for a piece-wise constant(PWC) image, thus, they provide
 +
a highly efficient representation for depth maps (since depth
 +
maps are nearly PWC). Since residual depth maps (resulting
 +
from intra or inter prediction) are also PWC, EATs also provide an efficient representation for
 +
residual depth maps. However, since edge information must be encoded and sent to the
 +
decoder, these EATs are not necessarily RD-optimal, i.e., if
 +
the edge map bit rate is too high, the RD cost for these EATs
 +
may actually be greater than the RD cost for the DCT. Therefore,
 +
it would be better to choose between EAT and DCT in an
 +
RD-optimal fashion.
 +
In other words, an
 +
EAT should only be used in place of the DCT when it yields
 +
lower RD cost than the DCT. Otherwise, the DCT should be
 +
used. This leads to an RD-optimized transform mode selection algorithm as shown in Fig. 2. The edge maps and transform mode information (e.g., EAT or DCT) are encoded using
 +
context-adaptive binary arithmetic coding [http://en.wikipedia.org/wiki/Context-adaptive_binary_arithmetic_coding (CABAC)].
 +
</ul>
 +
[[http://biron.usc.edu/~kumarsun/PCS10_fig2.pdf] Figure 2]
 +
 
 +
 
 +
<li>'''Publications'''
 +
<ul>
 +
<li>
 +
G. Shen, W.-S. Kim, S. K. Narang, A. Ortega, J. Lee, and H. Wey, "Edge-adaptive transforms for efficient depth-map coding," in Proc. of 28th Picture Coding Symposium (PCS ‘10), Nagoya, Japan, Dec. 2010.  [http://biron.usc.edu/~wooshikk/PCS10.pdf <small><b>PDF format</b></small>]</ul>
 +
<ul>
 +
<li>
 +
W.-S. Kim, S. K. Narang, and A. Ortega, "Graph based transforms for depth video coding," Proc. of IEEE Intl. Conf. Acoustics, Speech, and Signal Proc. (ICASSP 2012), Kyoto, Japan, Mar. 2012.</ul>
 +
<ul>
 +
<li>
 +
G. Cheung, W.-S. Kim, A. Ortega, J. Ishida, and A. Kubota, "Depth map coding using graph based transform and transform domain sparsification," Proc. of IEEE Intl. Workshop on Multimedia Signal Proc., Hangzhou, China, Oct. 2011. (Top 10% Paper Award)</ul>
 +
<br><br>
 +
<li>'''People'''
 +
<ul>
 +
  <li> [http://sipi.usc.edu/~ortega/ Antonio Ortega]
 +
  <li> [http://research.nii.ac.jp/~cheung/ Gene Cheung]
 +
  <li> [http://biron.usc.edu/wiki/index.php/Wooshikkim Wooshik Kim]
 +
  <li> [http://biron.usc.edu/wiki/index.php/Sunil_Kumar Sunil K Narang]
 +
</ul>
 +
<br>
 +
 
 +
<li>''' Software '''
 +
<ul>
 +
<li> [http://sipi.usc.edu/~ortega/Software/GraphTransform_Matlab.zip EAT transform demo]
 
</ul>
 
</ul>
 
<br>
 
<br>

Latest revision as of 10:07, 5 February 2016

Edge-adaptive transforms for efficient depth map coding

In this work a new set of edge-adaptive transforms (EATs) is presented as an alternative to the standard DCTs used in image and video coding applications. These transforms avoid filtering across edges in each image block, thus, they avoid creating large high frequency coefficients. These transforms are then combined with the DCT in H.264/AVC and a transform mode selection algorithm is used to choose between DCT and EAT in an RD-optimized manner. These transforms are applied to coding depth maps used for view synthesis in a multi-view video coding system, and provides up to 29% bit rate reduction for a fixed quality in the synthesized views.

Edge adaptive transform (EAT) design

The EAT design process consists of three steps, i.e., (i) edge detection is applied on the residual block to find edge locations, (ii) a graph is constructed based on the edge map, then (iii) an EAT is constructed and EAT coefficients are computed. The EAT coefficients are then quantized using a uniform scalar quantizer and the same run-length coding used for DCT coefficients is applied. The 2 × 2 sample block in Fig. 1 is used to illustrate the main ideas. We describe the encoder operation when applied to blocks of prediction residuals, though the same ideas can be easily applied to original pixel values.

[[1] Figure 1]

  • EAT construction
      First edge detection is applied on the residual block to find edge locations and a binary edge map is generated that indicates the locations of edges. If no edges are found in a residual block, then a DCT is used and no further EAT processing is performed. Otherwise, the encoder computes an EAT. A graph is generated from this edge map, where each pixel in the residual block is connected to each of its immediate neighbors (e.g., 4-connected neighbors) only if there is no edge between them. From this graph, we derive the adjacency matrix, diagonal degree matrix, and the Laplacian matrix. Projecting the signal of a graph G onto the eigenvectors of the Laplacian matrix yields a spectral decomposition of the signal, i.e., it provides a “frequency domain” interpretation of the signal on the graph. EAT coefficients are computed as the projection of the one-dimensionalized image-signal onto the eigenvectors of the Laplacian matrix. The coefficients are quantized with a uniform scalar quantizer. This vector of coefficients is then reformed into an N × N block by placing the coefficients in the standard zig-zag fashion used for the DCT.


  • RD optimization
      The EATs minimize the number of non-zero coefficients that must be encoded for a piece-wise constant(PWC) image, thus, they provide a highly efficient representation for depth maps (since depth maps are nearly PWC). Since residual depth maps (resulting from intra or inter prediction) are also PWC, EATs also provide an efficient representation for residual depth maps. However, since edge information must be encoded and sent to the decoder, these EATs are not necessarily RD-optimal, i.e., if the edge map bit rate is too high, the RD cost for these EATs may actually be greater than the RD cost for the DCT. Therefore, it would be better to choose between EAT and DCT in an RD-optimal fashion. In other words, an EAT should only be used in place of the DCT when it yields lower RD cost than the DCT. Otherwise, the DCT should be used. This leads to an RD-optimized transform mode selection algorithm as shown in Fig. 2. The edge maps and transform mode information (e.g., EAT or DCT) are encoded using context-adaptive binary arithmetic coding (CABAC).

    [[2] Figure 2]


  • Publications
    • G. Shen, W.-S. Kim, S. K. Narang, A. Ortega, J. Lee, and H. Wey, "Edge-adaptive transforms for efficient depth-map coding," in Proc. of 28th Picture Coding Symposium (PCS ‘10), Nagoya, Japan, Dec. 2010. PDF format
    • W.-S. Kim, S. K. Narang, and A. Ortega, "Graph based transforms for depth video coding," Proc. of IEEE Intl. Conf. Acoustics, Speech, and Signal Proc. (ICASSP 2012), Kyoto, Japan, Mar. 2012.
    • G. Cheung, W.-S. Kim, A. Ortega, J. Ishida, and A. Kubota, "Depth map coding using graph based transform and transform domain sparsification," Proc. of IEEE Intl. Workshop on Multimedia Signal Proc., Hangzhou, China, Oct. 2011. (Top 10% Paper Award)



  • People


  • Software