Machine Learning for Media: Increasing the Value of Existing Content
With more than 60% of TV sales in major markets now being capable of Ultra-High Definition (UHD) quality according to IHS Markit, UHD content has affirmed itself as a mainstream part of TV production. The ubiquity of UHD TV is only set to increase in the coming years, with the same research predicting that 574 million households globally will have a 4K TV set by 2023.
However, UHD content availability has lagged behind the development of consumer equipment. With production standards yet to catch up with this trend, much of the existing content will inevitably be available in HD, either 720p or 1080i format. So how can broadcasters and service providers fill full-time UHD channels with the consistent, high-quality viewing experience that consumers are demanding?
Even though TVs have reasonable upconversion capabilities, a centralized approach means that more resources can be applied to obtain a better result, which avoids the inconsistencies that currently exist between different TV delivery methods. Machine learning has matured to a significant extent in recent years, leading to the question of whether these techniques can help improve the results of the up-conversion process.
The Challenges with Traditional Upconversion
Upconversion techniques have been used by broadcasters for some time to offer media experiences that cut close to UHD, without the major costs associated with natively producing it in this format. These techniques can potentially be performed at either the TV/STB or at the broadcast headend prior to transmission. It originally came to the fore during the introduction of HD channels in the mid-2000s. With the burgeoning popularity of UHD TV, upconversion can be expected to fill the void here as well. However, traditional techniques struggle to deliver results beyond HD-like experiences. Up-conversion in TVs, although better than a few years ago, is limited in processing power and the results can differ between TVs.
Machine Learning-Based UHD Upconversion
Machine learning is a technique whereby a neural network learns features from a set of training data. For image processing, it is necessary to work in at least two dimensions, and the processing needs to consider suitably sized patches of the image(s). Machine learning technology can be used to perform advanced upconversion of high-value library content, using neural networks to create upconverted images that appear visually more similar to native UHD images. This enables the delivery of UHD channel viewing experiences that are visibly superior to HD services, providing an incentive to view a UHD channel.
Images tend to be large, and it becomes infeasible to create neural networks that process this data as a complete set. As a result, a different structure is used for image processing, known as Convolutional Neural Networks (CNNs). CNNs are structured to extract features from the images by successively processing subsets from the source image and then processing the features rather than the raw pixels.
By leveraging CNNs, broadcasters can create plausible new content that was not present in the original image, but which doesn't modify the nature of the image too much. The CNN used to create the UHD data from the HD source is known as the Generator CNN.
Neural Network Training Process
In order for the Generator CNN to do its job, there must be a training process whereby a set of known data are input into the neural network, and a comparison is made between the output and the correct image patch. Of course, in order to be able to do this, we need to know what "correct" means. Therefore, the starting point for training is a set of examples of high-resolution UHD representative images, which can be down-sampled to produce HD-representative images, then the results can be compared to the originals.
The difference between the original UHD image and the synthesized UHD image is calculated by the 'Compare' function, which is then fed back as an error signal to the Generator CNN. Over repeated training processes, the Generator CNN learns how to better create an image that is increasingly similar to an original UHD image.
Generative Adversarial Neural Networks
Generative Adversarial Neural Networks (GANs) are a relatively recent concept, where a second neural network, known as the discriminator CNN, is used and is itself trained during the training process of the Generator CNN. The principle is that the Discriminator learns to detect the difference between features that are characteristic of original UHD images and synthesized UHD images. During the training process, the discriminator sees either an original UHD image or a synthesized UHD image, with the detection correctness fed back to the discriminator and, if the image was a synthesized one, also fed back to the generator.
The use of Generative Adversarial Neural Networks (GANs) to synthesize detail in the upconverted image allows the viewer to enjoy a more compelling, higher quality experience. With this approach, broadcasters have the possibility and potential to interpolate in a more non-linear manner, thus achieving a significantly enhanced result.
As the training proceeds, each CNN is attempting to beat the other: the generator learns how to better create images that have characteristics that appear like original full-resolution images, while the discriminator becomes progressively better at detecting what the generator produces. The result is the synthesis of details that have features characteristic of original UHD images.
Adopting a Hybrid GAN Approach
With a GAN approach, there is no real constraint to the ability of the generator to create new details everywhere. The problem with this is that the generator can create images that, while containing plausible features, can diverge from the original image in more general ways. A better answer is to use a combination of the mathematical difference and the discriminator's correctness as the feedback to the generator CNN. This retains the detail regeneration, but also prevents excessive divergence. This construct produces results that are subjectively better than conventional up-conversion techniques.
With this new innovation in upconversion technique, machine learning architectures can now be unlocked to enable new strategic and monetization opportunities. Broadcasters, operators and TV service providers can fill the gap in availability on UHD content through the non-linear capabilities provided by CNNs using a hybrid GAN architecture.
This approach has been shown to be effective across a range of different content, offering a realistic means by which content that has more of the appearance of UHD can be created from both progressive and interlaced HD source. This in turn can enable an improved experience for the viewer at home when watching a UHD channel, even when some of that content does not exist natively as UHD.
[Editor's note: This is a vendor-written article from MediaKind. Streaming Media accepts contributed bylines based solely on their value to our readers.]