Video: Objective Quality Metrics--What You Need to Know
Learn more about objective quality metrics at Streaming Media's next event.
Watch the complete video of this presentation from Streaming Media West, T101: HOW-TO: Fine-Tuning Your Adaptive Encoding Groups with Objective Quality Metrics, in the Streaming Media Conference Video Portal.
Read the complete transcript of this clip:
Jan Ozer: So, what are objective quality metrics? Essentially, they're mathematical formulas that are designed to predict how a human being will judge video quality. We don't care about the number; we care about what that number means regarding how a human will look at a video and say, “Oh, that's good, that's bad, that's excellent, that's low-quality.”
So, what we're attempting to do is predict that, and there are a bunch of examples out there. There's mean opinion score, which is typically associated with subjective quality ratings by actual human beings. There's peak signal-to-noise ratio, structural similarity index, SSIMplus, and VMAF, which is kind of been made famous by the fact that it was invented by Netflix and is used by Netflix per-titling coding engine.
I use VMAF a lot because, number one, I believe in Netflix, we all see it every day on our TV screens, the quality is very, very good, and number two, it's open-sourced.
You can get VMAF scores from the tool I'm going to show you, which is the Moscow University Video Quality Measurement, but you can also get it from FFmpeg and you can also get it from some other tools that are out there.
I like open-source tools because everybody has access to them and everybody can make them work. This is kind of how I look at objective quality metrics, and here we see mean opinion score, PSNR, all the way over to VMAF. And scoring for MOS is one to five, artifact threshold.
What you want with an objective quality metric is you want to number that is associated with that video is going to be okay, and it's probably easiest to explain in the context of VMAF, whereas, if you get a VMAF score.
Reza Rassool, who's the CTO at RealNetworks, did a study where if you get a score of 93. You're either not going to have visible artifacts or those artifacts won't be disturbing. Meaning, if you encode a video and it comes out with a score of 93, that means the video's okay, you can ship it. Nobody's going to say “That looks awful.”
So, that's what I mean by no artifact threshold. And then, you also want a threshold where artifacts are generally assumed to be present, and the best example here is in PSNR where Netflix came out and said, if you see a score under 35dB, there's a good likelihood that there are going to be artifacts in that video.
And then there are interpreting scores. This is really one of the key aspects as well because, again, you don't really care about a score of 42 or 58, you care, “How does that score correlate with how human beings will objectively watch that video?”
From mean opinion scores, it's basically one through five on this ranking, pretty simple. Just-noticeable difference is a concept of if you see a difference in dB for PSNR between, say, 38 and 37, what does that mean? Is that catastrophic or is anybody going to notice?
One of the nice things about VMAF down here is Netflix has come out and said, if you have a VMAF score that's six points different. That constitutes a just-noticeable difference, and a just noticeable difference is, by definition, something that 75% of the people will see if they're evaluating the videos one by one.
So, if I see a VMAF differential between two techniques or between two codecs and it's one, I say, “Okay, nobody cares, nobody will see that.” On the other hand, if it's 15, then I know everybody going to see it. When I talk about just-noticeable difference, that's what I mean there.
Device ratings. Everybody knows that video that might look great on an iPhone, on your little 5” screen, could look terrible on your 62” 4K TV.
So, do implementations of the metric give you device ratings? And MOS, I'm not available for any of those, PSNR, I'm not aware of any that do that, SSIM. SSIMplus has an excellent capability for this. They’ve got 50 or 60 devices that come with, and if you need a different device rated, they can create that for you.
SSIMplus has an excellent ability to tell you how a video is going to look on a particular device. And VMAF, when it came out, the standard version of VMAF was tuned for a 1080p display at a height of three times the viewing distance, something like that, living room.
Since then, they've come out with both a phone model and a 4K model. They don't have as much detail as you get with SSIMplus, but they're moving in that direction. And it's pretty interesting because, according to VMAF, a 720p video that will get a rating of 72 on a living room TV is going to get a rating of 99 on a phone.
So, what does that tell you? You don't need to go beyond 720p if you're creating an encoding ladder that you're distributing to phones.
That's the kind of information you get from device ratings. And then ownership, again, comes down to whether it's open-source, whether you can access it from multiple tools. I can access PSNR from FFmpeg; I can access it from the SSIMplus tool, I can access it from the Moscow University tool because it's an open-source metric.
Again, I don't want to write a book on a proprietary metric because only people who have the tool can benefit from the book. Same thing with articles that I write, same things with projects that I do from a consulting perspective. I like using open-source tools.
Streaming Learning Center's Jan Ozer discusses per-title encoding strategies including constant rate factor (CRF) encoding in this clip from his presentation at Streaming Media West 2018.
Streaming Learning Center's Jan Ozer compares four approaches to testing video compression quality in this clip from Streaming Media East.