Conference Research Tests Adaptive Video and Quality Benchmarks
One of the key issues tackled was the frequency of stream switching, which we in part control as producers via the number of streams created for each source. Create many variations close together, and you’ll have lots of stream switching; create a lower number over a wider range, and you’ll have fewer stream switches.
Regarding the QoE impact of stream switching, conclusions are mixed. Most studies found that frequent stream switching reduced QoE, which seemed to argue for fewer streams. Other studies concluded that viewers preferred multiple, gradual variations over a single abrupt variation, which seemed to argue for more streams.
The type of content produced should definitely play a role in the switching strategy. Stream switching was less noticeable in content with frequent camera angle or scene changes, like movies or sports videos, than with more steady content, like a single camera lecture or conference. This makes perfect sense, since your attention is drawn away from video quality while catching up with the new scenes unfolding in front of you. The takeaway: Movies, sports, and similar content can benefit from more streams in the adaptive group because the changing cameras and camera angles mask switching from the viewer. Conference or training videos, particularly those with a single camera angle, benefit from fewer streams, as most viewers prefer even a lower quality stream over frequent switching.
One key decision facing video producers is which stream to deliver first. Some prefer a low quality stream that gets the video playing fast; others want a high quality stream that may buffer for a bit but will deliver a good first impression. Studies referenced in this paper showed “that a low startup bitrate followed by slow increase ("ramp-up") of quality clearly degrades the QoE.”
With HTTP Live Streaming, the first item specified in the M3U8 playlist is the first stream seen by the viewer. These studies clearly show that, as Apple recommends in TN2224, producers should deploy different playlists for mobile, desktop, and perhaps even OTT with different initial streams. Specifically, TN2224 states, “You should create multiple playlists that have the same set of streams, but each with a different first entry that is appropriate for the target network. This ensures the user has a good experience when the stream is first played.”
While the TechNote’s recommendations of a 150 kbps stream for mobile and 440 kbps for Wi-Fi seem conservative, the point is clear: What the viewer sees first sets the tone. The best strategy is to set the first stream at the highest possible quality the viewer can sustain.
Back on Point
Getting back to the current paper, the authors largely focused their attention on the best methodology to test adaptive streaming strategies. For their subjective tests, the authors used 7 6-minute clips encoded to 4 quality levels and compressed into 2- and 10-second chunk sizes. These were assembled into test sequences of increasing and decreasing quality, with gradual and rapid quality changes. The entire clips at constant quality were also subjectively reviewed. In all, the authors created 132 different test sequences that were used for multiple tests.
To be clear, each test sequence was a “canned” adaptive streaming experience, complete with programmed stream switching. That way, the authors could systematically compare the perceived viewer quality of a stream with many or few stream changes, or with abrupt or gradual stream changes.
The authors had used these particular tests sequences previously, but in a different way. Specifically, in previous tests, they broke each six-minute test video into multiple shorter sequences containing individual switching events, with a quality assessment after each shorter sequence. These tests were administered with and without audio.
In the tests documented in this article, the researchers tested subjective quality only after playing the entire six-minute test sequence. The specific question they wanted to answer was whether or not the results in the shorter tests accurately predicted the results from the longer test. This is important because shorter tests are much easier on the test subjects.
While both sets of shorter tests proved relatively accurate at predicting the results of the longer test, the shorter tests without audio were almost an exact match. The authors postulated that the lack of audio in the initial tests allowed subjects to focus on video quality, producing more accurate results.
The authors also tested whether or not objective, non-referential tests that focused on the conditions that might have produced the subjective scores—such as blockiness, blur, brightness, and noise—were an accurate predictor of subjective ratings. Briefly, non-referential tests examine only the encoded video itself, and don’t compare the encoded video to the source. In contrast, full-reference benchmarks like Peak Signal-to-Noise ratio (PSNR) and the aforementioned SSIM metric compute their scores by comparing the encoded video back to the source, which is much more time consuming and challenging, because it requires the source file to measure quality. If the researchers could prove that the non-referential benchmarks tested had a high correlation with the results of the subjective tests, it would have been a huge benefit for researchers, since non-referential tests are fast, easy, and inexpensive to apply.
No joy here, however, as the researchers found that these tests were a poor predictor of overall subjective quality, though performance could be improved by clustering videos based on spatial and temporal characteristics. The bottom line is that producers can’t generally use these tests to predict the quality of an adaptive streaming experience.
Applicability of Existing Objective Metrics of Perceptual Quality for Adaptive Video Streaming
In this related paper and presentation, written by Jacob Søgaard (Technical University of Denmark), Luk´a?s Krasula (Czech Technical University in Prague and LUNAM University), Muhammad Shahid (Blekinge Institute of Technology), Dogancan Temel (Georgia Institute of Technology), Kjell Brunnström (Acreo Swedish ICT AB and Mid Sweden University), and Manzoor Razaak (Kingston University), the authors extended the limited set of non-referential objective metrics tested in the previous paper to a range of full reference and non-reference objective metrics such as PSNR, SSIM, and VQM. Again, the specific issue was whether or not these metrics could predict the subjective scores of the complete six-minute adaptive streaming experience, with stream switches and all, not the subjective video quality of a single compressed stream.
Again, the answer was no joy, though two tests, the VQM-VFD and PEVQ-S showed some promise. As an example, while VQM-VFD is a sophisticated metric that deploys a neural network to interpret the inputs, the scatter graph shown in Figure 5 shows a generally low correlation between the Mean Opinion Scores (MOS) reported by the subjective tests and the VQM-VFD test results. Overall, the authors concluded, “[u]pon experimenting with existing objective methods for their applicability on such videos, we observed that these methods, which are known to perform well otherwise, severally fall short in accuracy in this case.” The bottom line is that just because a tool performs well for one video distribution system (e.g. IPTV over UDP), producers can’t assume that it will perform equally well for assessing the quality of an adaptive streaming experience.
In this session, Jan Ozer presents a live video comparison that includes cost, stream redundancy, packaging flexibility, bandwidth requirements, DRM and captioning support, and scalability.
One-size-fits-all encoding doesn't produce the best possible results, so Netflix recently moved to per-title optimization. Learn why this improves video quality and saves on bandwidth, but isn't the right model for every company.
By recognizing that some titles are more visually demanding than others, Netflix has revolutionized the way it encodes video and will dramatically cut down bandwidth requirements.