Deep Learning Post-Filtering Using Multi-Head Attention and Multiresolution Feature Fusion for Image and Intra-Video Quality Enhancement

oleh: Ionut Schiopu, Adrian Munteanu

Format:	Article
Diterbitkan:	MDPI AG 2022-02-01

Deskripsi

The paper proposes a novel post-filtering method based on convolutional neural networks (CNNs) for quality enhancement of RGB/grayscale images and video sequences. The lossy images are encoded using common image codecs, such as JPEG and JPEG2000. The video sequences are encoded using previous and ongoing video coding standards, high-efficiency video coding (HEVC) and versatile video coding (VVC), respectively. A novel deep neural network architecture is proposed to estimate fine refinement details for full-, half-, and quarter-patch resolutions. The proposed architecture is built using a set of efficient processing blocks designed based on the following concepts: (i) the multi-head attention mechanism for refining the feature maps, (ii) the weight sharing concept for reducing the network complexity, and (iii) novel block designs of layer structures for multiresolution feature fusion. The proposed method provides substantial performance improvements compared with both common image codecs and video coding standards. Experimental results on high-resolution images and standard video sequences show that the proposed post-filtering method provides average BD-rate savings of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>31.44</mn><mo>%</mo></mrow></semantics></math></inline-formula> over JPEG and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>54.61</mn><mo>%</mo></mrow></semantics></math></inline-formula> over HEVC (x265) for RGB images, Y-BD-rate savings of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>26.21</mn><mo>%</mo></mrow></semantics></math></inline-formula> over JPEG and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>15.28</mn><mo>%</mo></mrow></semantics></math></inline-formula> over VVC (VTM) for grayscale images, and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>15.47</mn><mo>%</mo></mrow></semantics></math></inline-formula> over HEVC and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>14.66</mn><mo>%</mo></mrow></semantics></math></inline-formula> over VVC for video sequences.

Find in Library

Indexed Open Access Databases

Deep Learning Post-Filtering Using Multi-Head Attention and Multiresolution Feature Fusion for Image and Intra-Video Quality Enhancement

Deskripsi