The Video Insiders

Video coding retrospective with codec expert Pankaj Topiwala.

Episode Summary

The Video Insiders talk wavelet compression and the progressive expansion of AVC, HEVC, and VVC compression efficiency with one of the industry's foremost codec experts, CEO of FastVDO, Pankaj Topiwala. In this information-rich interview, Pankaj explains the difference between patent-free and royalty-free codecs and shares his opinion on whether consortia developed codecs such as AV1, developed by the AOM group of companies, will be a part of the codec landscape moving forward.

Episode Notes

Click to watch SPIE Future Video Codec Panel Discussion

Related episode with Gary Sullivan at Microsoft: VVC, HEVC & other MPEG codec standards

Interview with MPEG Chairman Leonardo Charliogne: MPEG Through the Eyes of it's Chairman

Learn about FastDVO here

Pankaj Topiwala LinkedIn profile

--------------------------------------

The Video Insiders LinkedIn Group is where thousands of your peers are discussing the latest video technology news and sharing best practices. Click here to join

Would you like to be a guest on the show? Email: thevideoinsiders@beamr.com

Learn more about Beamr

--------------------------------------

TRANSCRIPT:

Pankaj Topiwala: 00:00 With H.264 H.265 HEVC in 2013, we were now able to do up to 300 to one to up to 500 to one compression on a, let's say a 4K video. And with VVC we have truly entered a new realm where we can do up to 1000 to one compression, which is three full orders of magnitude reduction of the original size. If the original size is say 10 gigabits, we can bring that down to 10 megabits. And that's unbelievable. And so video compression truly is a remarkable technology and you know, it's a, it's a marval to look at

 

Announcer: 00:39 The Video Insiders is the show that makes sense of all that is happening in the world of online video as seen through the eyes of a second generation codec nerd and a marketing guy who knows what I-frames and macro blocks are. And here are your hosts, Mark Donnigan and Dror Gill.

 

Speaker 3: 00:39

 

Dror Gill: 01:11 Today we're going to talk with one of the key figures in the development of a video codecs and a true video insider Pankaj Topiwala. Hello Pankaj and welcome to The Video Insiders podcast.

 

Pankaj Topiwala: 01:24 Gentlemen. hello, and thank you very much for this invite. It looks like it's going to be a lot of fun.

 

Mark Donnigan: 01:31 It is. Thank you for joining Pankaj.

 

Dror Gill: 01:33 Yeah, it sure will be a lot of fun. So can you start by telling us a little bit about your experience in codec development?

 

Pankaj Topiwala: 01:41 Sure, so, I should say that unlike a number of the other people that you have interviewed or may interview my background is fair bit different. I really came into this field really by a back door and almost by chance my degree PhD degree is actually in mathematical physics from 1985. And I actually have no engineering, computer science or even management experience. So naturally I run a small research company working in video compression and analytics, and that makes sense, but that's just the way things go in the modern world. But that the effect for me was a, and the entry point was that even though I was working in very, very abstract mathematics I decided to leave. I worked in academia for a few years and then I decided to join industry. And at that point they were putting me into applied mathematical research.

 

Pankaj Topiwala: 02:44 And the topic at that time that was really hot in applied mathematics was a topic of wavelets. And I ended up writing and edited a book called wavelet image and video compression in 1998. Which was a lot of fun along with quite a few other co authors on that book. But, wavelets had its biggest contribution in the compression of image and video. And so that led me finally to enter into, and I noticed that video compression was a far larger field than image compression. I mean, by many orders, by orders of magnitude. It is probably a hundred times bigger in terms of market size than, than image compression. And as a result I said, okay, if the sexiest application of this new fangled mathematics could be in video compression I entered that field roughly with the the book that I mentioned in 1998.

 

Mark Donnigan: 03:47 So one thing that I noticed Pankaj cause it's really interesting is your, your initial writing and you know, research was around wavelet compression and yet you have been very active in ISO MPEG, all block-based codecs. So, so tell us about that?

 

Pankaj Topiwala: 04:08 Okay. Well obviously you know when you make the transition from working on the wavelets and our initial starting point was in doing wavelet based video compression. When I started first founded my company fastVDO in 1998, 1999 period we were working on wavelet based video compression and we, we pushed that about as much as we could. And at that, at one point we had what we felt was the world's best a video compression using wavelets in fact, but best overall. And it had the feature that you know, one thing that we should, we should tell your view or reader listeners is that the, the value of wavelets in particular in image coding is that not only can you do state of the art image coding, but you can make the bitstream what is called embedded, meaning you can chop it off at anywhere you like, and it's still a decodable stream.

 

Pankaj Topiwala: 05:11 And in fact it is the best quality you can get for that bit rate. And that is a powerful, powerful thing you can do in image coding. Now in video, there is actually no way to do that. Video is just so much more complicated, but we did the best we could to make it not embedded, but at least scalable. And we, we built a scalable wavelet based video codec, which at that time was beating at the current implementations of MPEG4. So we were very excited that we could launch a company based on a proprietary codec that was based on this new fangled mathematics called wavelets. And lead us to a state of the art codec. The facts of the ground though is that just within the first couple of years of running our company, we found that in fact the block-based transformed codecs that everybody else was using, including the implementers of MPEG4.

 

Pankaj Topiwala: 06:17 And then later AVC, those quickly surpassed anything we could build with with wavelets in terms of both quality and stability. The wavelet based codecs were not as powerful or as stable. And I can say quite a bit more about why that's true. If you want?

 

Dror Gill: 06:38 So when you talk about stability, what exactly are you referring to in, in a video codec?

 

Pankaj Topiwala: 06:42 Right. So let's let's take our listeners back a bit to compare image coding and video coding. Image coding is basically, you're given a set of pixels in a rectangular array and we normally divide that into blocks of sub blocks of that image. And then do transforms and then quantization and than entropy coding, that's how we typically do image coding. With the wavelet transform, we have a global transform. It's a, it's ideally done on the entire image.

 

Pankaj Topiwala: 07:17 And then you could do it multiple times, what are called multiple scales of the wavelet transform. So you could take various sub sub blocks that you create by doing the wavelet transfer and the low pass high pass. Ancs do that again to the low low pass for multiple scales, typically about four or five scales that are used in popular image codecs that use wavelets. But now in video, the novelty is that you don't have one frame. You have many, many frames, hundreds or thousands or more. And you have motion. Now, motion is something where you have pieces of the image that float around from one frame to another and they float randomly. That is, it's not as if all of the motion is in one direction. Some things move one way, some things move other ways, some things actually change orientations.

 

Pankaj Topiwala: 08:12 And they really move, of course, in three dimensional space, not in our two dimensional space that we capture. That complicates video compression enormously over image compression. And it particularly complicates all the wavelet methods to do video compression. So, wavelet methods that try to deal with motion were not very successful. The best we tried to do was using motion compensated video you know, transformed. So doing wavelet transforms in the time domain as well as the spatial domain along the paths of motion vectors. But that was not very successful. And what I mean by stability is that as soon as you increase the motion, the codec breaks, whereas in video coding using block-based transforms and block-based motion estimation and compensation it doesn't break. It just degrades much more gracefully. Wavelet based codecs do not degrade gracefully in that regard.

 

Pankaj Topiwala: 09:16 And so we of course, as a company we decided, well, if those are the facts on the ground. We're going to go with whichever way video coding is going and drop our initial entry point, namely wavelets, and go with the DCT. Now one important thing we found was that even in the DCT- ideas we learned in wavelets can be applied right to the DCT. And I don't know if you're familiar with this part of the story, but a wavelet transform can be decomposed using bits shifts and ads only using something called the lifting transform, at least a important wavelet transforms can. Now, it turns out that the DCT can also be decomposed using lifting transforms using only bit shifts and ads. And that is something that my company developed way back back in 1998 actually.

 

Pankaj Topiwala: 10:18 And we showed that not only for DCT, but a large class of transforms called lab transforms, which included the block transforms, but in particular included more powerful transforms the importance of that in the story of video coding. Is that up until H.264, all the video codec. So H.261, MPEG-1, MPEG-2, all these video codecs used a floating point implementation of the discrete cosign transform and without requiring anybody to implement you know a full floating point transform to a very large number of decimal places. What they required then was a minimum accuracy to the DCT and that became something that all codecs had to do. Instead. If you had an implementation of the DCT, it had to be accurate to the true floating point DCT up to a certain decimal point in, in the transform accuracy.

 

Pankaj Topiwala: 11:27 With the advent of H.264, with H.264, we decided right away that we were not going to do a flooding point transform. We were going to do an integer transform. That decision was made even before I joined, my company joined, the development base, H.264, AVC, But they were using 32 point transforms. We found that we could introduce 16 point transforms, half the complexity. And half the complexity only in the linear dimension when you, when you think of it as a spatial dimension. So two spatial dimensions, it's a, it's actually grows more. And so the reduction in complexity is not a factor of two, but at least a factor of four and much more than that. In fact, it's a little closer to exponential. The reality is that we were able to bring the H.264 codec.

 

Pankaj Topiwala: 12:20 So in fact, the transform was the most complicated part of the entire codec. So if you had a 32 point transform, the entire codec was at 32 point technology and it needed 32 points, 32 bits at every sample to process in hardware or software. By changing the transform to 16 bits, we were able to bring the entire codec to a 16 bit implementation, which dramatically improved the hardware implementability of this transfer of this entire codec without at all effecting the quality. So that was an important development that happened with AVC. And since then, we've been working with only integer transforms.

 

Mark Donnigan: 13:03 This technical history is a really amazing to hear. I, I didn't actually know that Dror or you, you probably knew that, but I didn't.

 

Dror Gill: 13:13 Yeah, I mean, I knew about the transform and shifting from fixed point, from a floating point to integer transform. But you know, I didn't know that's an incredible contribution Pankaj.

 

Pankaj Topiwala: 13:27 We like to say that we've saved the world billions of dollars in hardware implementations. And we've taken a small a small you know, a donation as a result of that to survive as a small company.

 

Dror Gill: 13:40 Yeah, that's great. And then from AVC you moved on and you continued your involvement in, in the other standards, right? That's followed.

 

Pankaj Topiwala: 13:47 in fact, we've been involved in standardization efforts now for almost 20 years. My first meeting was a, I recall in may of 2000, I went to a an MPEG meeting in Geneva. And then shortly after that in July I went to an ITU VCEG meeting. VCEG is the video coding experts group of the ITU. And MPEG is the moving picture experts group of ISO. These two organizations were separately pursuing their own codecs at that time.

 

Pankaj Topiwala: 14:21 ISO MPEG was working on MPEG-4 and ITU VCEG was working on H.263, and 263 plus and 263 plus plus. And then finally they started a project called 263 L for longterm. And eventually it became clear to these two organizations that look, it's silly to work on, on separate codecs. They had worked once before in MPEG-2 develop a joint standard and they decided to, to form a joint team at that time called the joint video team, JVT to develop the H.264 AVC video codec, which was finally done in 2003. We participate participated you know fully in that making many contributions of course in the transform but also in motion estimation and other aspects. So, for example, it might not be known that we also contributed the fast motion estimation that's now widely used in probably nearly all implementations of 264, but in 265 HEVC as well.

 

Pankaj Topiwala: 15:38 And we participated in VVC. But one of the important things that we can discuss is these technologies, although they all have the same overall structure, they have become much more complicated in terms of the processing that they do. And we can discuss that to some extent if you want?

 

Dror Gill: 15:59 The compression factors, just keep increasing from generation to generation and you know, we're wondering what's the limit of that?

 

Pankaj Topiwala: 16:07 That's of course a very good question and let me try to answer some of that. And in fact that discussion I don't think came up in the discussion you had with Gary Sullivan, which certainly could have but I don't recall it in that conversation. So let me try to give for your listeners who did not catch that or are not familiar with it. A little bit of the story.

 

Pankaj Topiwala: 16:28 The first international standard was the ITU. H.261 standard dating roughly to 1988 and it was designed to do only about 15 to one to 20 to one compression. And it was used mainly for video conferencing. And at that time you'd be surprised from our point of view today, the size of the video being used was actually incredibly tiny about QCIP or 176 by 144 pixels. Video of that quality that was the best we could conceive. And we thought we were doing great. And doing 20 to one compression, wow! Recall by the way, that if you try to do a lossless compression of any natural signal, whether it's speech or audio or images or video you can't do better than about two to one or at most about two and a half to one.

 

Pankaj Topiwala: 17:25 You cannot do, typically you cannot even do three to one and you definitely cannot do 10 to one. So a video codec that could do 20 to one compression was 10 times better than what you could do lossless, I'm sorry. So this is definitely lossy, but lossy with still a good quality so that you can use it. And so we thought we were really good. When MPEG-1 came along in, in roughly 1992 we were aiming for 25 to one compression and the application was the video compact disc, the VCD. With H.262 or MPEG-2 roughly 1994, we were looking to do about 35 to one compression, 30 to 35. And the main application was then DVD or also broadcast television. At that point, broadcast television was ready to use at least in some, some segments.

 

Pankaj Topiwala: 18:21 Try digital broadcasting. In the United States, that took a while. But in any case it could be used for broadcast television. And then from that point H.264 AVC In 2003, we jumped right away to more than 100 to one compression. This technology at least on large format video can be used to shrink the original size of a video by more than two orders of magnitude, which was absolutely stunning. You know no other natural signal, not speech, not broadband, audio, not images could be compressed that much and still give you high quality subjective quality. But video can because it's it is so redundant. And because we don't understand fully yet how to appreciate video. Subjectively. We've been trying things you know, ad hoc. And so the entire development of video coding has been really by ad hoc methods to see what quality we can get.

 

Pankaj Topiwala: 19:27 And by quality we been using two two metrics. One is simply a mean square error based metric called peak signal to noise ratio or PSNR. And that has been the industry standard for the last 35 years. But the other method is simply to have people look at the video, what we call subjective rating of the video. Now it's hard to get a subjective rating. That's reliable. You have to do a lot of standardization get a lot of different people and take mean opinion scores and things like that. That's expensive. Whereas PSNR is something you can calculate on a computer. And so people have mostly in the development of video coding for 35 years relied on one objective quality metric called PSNR. And it is good but not great. And it's been known right from the beginning that it was not perfect, not perfectly correlated to video quality, and yet we didn't have anything better anyway.

 

Pankaj Topiwala: 20:32 To finish the story of the video codecs with H.265 HEVC in 2013, we were now able to do up to 300 to one to up to 500 to one compression on let's say a 4K. And with VVC we have truly entered a new realm where we can do up to 1000 to one compression, which is three full orders of magnitude reduction of the original size. If the original size is say, 10 gigabits, we can bring that down to 10 megabits. And that's unbelievable. And so video compression truly is a remarkable technology. And you know, it's a, it's a marvel to look at. Of course it does not, it's not magic. It comes with an awful lot of processing and an awful lot of smarts have gone into it. That's right.

 

Mark Donnigan: 21:24 You know Pankaj, that, is an amazing overview and to hear that that VVC is going to be a thousand to one. You know, compression benefit. Wow. That's incredible!

 

Pankaj Topiwala: 21:37 I think we should of course we should of course temper that with you know, what people will use in applications. Correct. They may not use the full power of a VVC and may not crank it to that level. Sure, sure. I can certainly tell you that that we and many other companies have created bitstreams with 1000 to one or more compression and seeing video quality that we thought was usable.

 

Mark Donnigan: 22:07 One of the topics that has come to light recently and been talked about quite a bit. And it was initially raised by Dave Ronca who used to lead encoding at Netflix for like 10 years. In fact you know, I think he really built that department, the encoding team there and is now at Facebook. And he wrote a LinkedIn article post that was really fascinating. And what he was pointing out in this post was, was that with compression efficiency and as each generation of codec is getting more efficient as you just explained and gave us an overview. There's a, there's a problem that's coming with that in that each generation of codec is also getting even more complex and you know, in some settings and, and I suppose you know, Netflix is maybe an example where you know, it's probably not accurate to say they have unlimited compute, but their application is obviously very different in terms of how they can operate their, their encoding function compared to someone who's doing live, live streaming for example, or live broadcast. Maybe you can share with us as well. You know, through the generation generational growth of these codecs, how has the, how has the compute requirements also grown and has it grown in sort of a linear way along with the compression efficiency? Or are you seeing, you know, some issues with you know, yes, we can get a thousand to one, but our compute efficiency is getting to the, where we could be hitting a wall.

 

Pankaj Topiwala: 23:46 You asked a good question. Has the complexity only scaled linearly with the compression ratio? And the answer is no. Not at all. Complexity has outpaced the compression ratio. Even though the compression ratio is, is a tremendous, the complexity is much, much higher. And has always been at every step. First of all there's a big difference in doing the research, the research phase in development of the, of a technology like VVC where we were using a standardized reference model that the committee develops along the way, which is not at all optimized. But that's what we all use because we share a common code base. And make any new proposals based on modifying that code base. Now that code base is always along the entire development chain has always been very, very slow.

 

Pankaj Topiwala: 24:42 And true implementations are anywhere from 100 to 500 times more efficient in complexity than the reference software. So right away you can have the reference software for say VVC and somebody developing a, an implementation that's a real product. It can be at least 100 times more efficient than what the reference software, maybe even more. So there's a big difference. You know, when we're developing a technology, it is very hard to predict what implementers will actually come up with later. Of course, the only way they can do that is that companies actually invest the time and energy right away as they're developing the standard to build prototype both software and hardware and have a good idea that when they finish this, you know, what is it going to really cost? So just to give you a, an idea, between, H.264 and

 

Pankaj Topiwala: 25:38 H.265, H.264, only had two transforms of size, four by four and eight by eight. And these were integer transforms, which are only bit shifts and adds, took no multiplies and no divides. The division in fact got incorporated into the quantizer and as a result, it was very, very fast. Moreover, if you had to do, make decisions such as inter versus intra mode, the intra modes there were only about eight or 10 intra modes in H.264. By contrast in H.265. We have not two transforms eight, four by four and eight by, but in fact sizes of four, eight, 16 and 32. So we have much larger sized transforms and instead of a eight or 10 intra modes, we jumped up to 35 intra modes.

 

Pankaj Topiwala: 26:36 And then with a VVC we jumped up to 67 intro modes and we just, it just became so much more complex. The compression ratio between HEVC and VVC is not quite two to one, but let's say, you know, 40% better. But the the complexity is not 40% more. On the ground and nobody has yet, to my knowledge, built a a, a, a fully compliant and powerful either software or hardware video codec for VVC yet because it's not even finished yet. It's going to be finished in July 2020. When it, when, the dust finally settles maybe four or five years from now, it will be, it will prove to be at least three or four times more complex than HEVC encoder the decoder, not that much. The decoder, luckily we're able to build decoders that are much more linear than the encoder.

 

Pankaj Topiwala: 27:37 So I guess I should qualify as discussion saying the complexity growth is all mostly been in the encoder. The decoder has been a much more reasonable. Remember, we are always relying on this principle of ever-increasing compute capability. You know, a factor of two every 18 months. We've long heard about all of this, you know, and it is true, Moore's law. If we did not have that, none of this could have happened. None of this high complexity codecs, whatever had been developed because nobody would ever be able to implement them. But because of Moore's law we can confidently say that even if we put out this very highly complex VVC standard, someday and in the not too distant future, people will be able to implement this in hardware. Now you also asked a very good question earlier, is there a limit to how much we can compress?

 

Pankaj Topiwala: 28:34 And also one can ask relatively in this issue, is there a limit to a Moore's law? And we've heard a lot about that. That may be finally after decades of the success of Moore's law and actually being realized, maybe we are now finally coming to quantum mechanical limits to you know how much we can miniaturize in electronics before we actually have to go to quantum computing, which is a totally different you know approach to doing computing because trying to go smaller die size. Well, we'll make it a unstable quantum mechanically. Now the, it appears that we may be hitting a wall eventually we haven't hit it yet, but we may be close to a, a physical limit in die size. And in the observations that I've been making at least it seems possible to me that we are also reaching a limit to how much we can compress video even without a complexity limit, how much we can compress video and still obtain reasonable or rather high quality.

 

Pankaj Topiwala: 29:46 But we don't know the answer to that. And in fact there are many many aspects of this that we simply don't know. For example, the only real arbiter of video quality is subjective testing. Nobody has come up with an objective video quality metric that we can rely on. PSNR is not it. When, when push comes to shove, nobody in this industry actually relies on PSNR. They actually do subjective testing well. So in that scenario, we don't know what the limits of visual quality because we don't understand human vision, you know, we try, but human vision is so complicated. Nobody can understand the impact of that on video quality to any very significant extent. Now in fact, the first baby steps to try to understand, not explicitly but implicitly capture subjective human video quality assessment into a neural model. Those steps are just now being taken in the last couple of years. In fact, we've been involved, my company has been involved in, in getting into that because I think that's a very exciting area.

 

Dror Gill: 30:57 I tend to agree that modeling human perception with a neural network seems more natural than, you know, just regular formulas and algorithms which are which are linear. Now I, I wanted to ask you about this process of, of creating the codecs. It's, it's very important to have standards. So you encode a video once and then you can play it anywhere and anytime and on any device. And for this, the encoder and decoder need to agree on exactly the format of the video. And traditionally you know, as you pointed out with all the history of, of development. Video codecs have been developed by standardization bodies, MPEG and ITU first separately. And then they joined forces to develop the newest video standards. But recently we're seeing another approach to develop codecs, which is by open sourcing them.

 

Dror Gill: 31:58 Google started with an open source code, they called VP9 which they first developed internally. Then they open sourced it and and they use it widely across their services, especially in, YouTube. And then they joined forces with the, I think the largest companies in the world, not just in video but in general. You know those large internet giants such as Amazon and Facebook and and Netflix and even Microsoft, Apple, Intel have joined together with the Alliance of Open Media to jointly create another open codec called AV1. And this is a completely parallel process to the MPEG codec development process. And the question is, do you think that this was kind of a one time effort to, to to try and find a, or develop a royalty free codec, or is this something that will continue? And how do you think the adoption of the open source codecs versus the committee defined codecs, how would that adoption play out in the market?

 

Pankaj Topiwala: 33:17 That's of course a large topic on its own. And I should mention that there have been a number of discussions about that topic. In particular at the SPIE conference last summer in San Diego, we had a panel discussion of experts in video compression to discuss exactly that. And one of the things we should provide to your listeners is a link to that captured video of the panel discussion where that topic is discussed to some significant extent. And it's on YouTube so we can provide a link to that. My answer. And of course none of us knows the future. Right. But we're going to take our best guesses. I believe that this trend will continue and is a new factor in the landscape of video compression development.

 

Pankaj Topiwala: 34:10 But we should also point out that the domain of preponderance use preponderant use of these codecs is going to be different than in our traditional codecs. Our traditional codecs such as H.264 265, were initially developed for primarily for the broadcast market or for DVD and Blu-ray. Whereas these new codecs from AOM are primarily being developed for the streaming media industry. So the likes of Netflix and Amazon and for YouTube where they put up billions of user generated videos. So, for the streaming application, the decoder is almost always a software decoder. That means they can update that decoder anytime they do a software update. So they're not limited by a hardware development cycle. Of course, hardware companies are also building AV1.

 

Pankaj Topiwala: 35:13 And the point of that would be to try to put it into handheld devices like laptops, tablets, and especially smartphones. But to try to get AV1 not only as a decoder but also as an encoder in a smartphone is going to be quite complicated. And the first few codecs that come out in hardware will be of much lower quality, for example, comparable to AVC and not even the quality of HEVC when they first start out. So that's... the hardware implementations of AV1 that work in real time are not going to be, it's going to take a while for them to catch up to the quality that AV1 can offer. But for streaming we, we can decode these streams reasonably well in software or in firmware. And the net result is that, or in GPU for example, and the net result is that these companies can already start streaming.

 

Pankaj Topiwala: 36:14 So in fact Google is already streaming some test streams maybe one now. And it's cloud-based YouTube application and companies like Cisco are testing it already, even for for their WebEx video communication platform. Although the quality will not be then anything like the full capability of AV1, it'll be at a much reduced level, but it'll be this open source and notionally, you know, royalty free video codec.

 

Dror Gill: 36:50 Notionally. Yeah. Because they always tried to do this, this dance and every algorithm that they try to put into the standard is being scrutinized and, and, and they check if there are any patents around it so they can try and keep this notion of of royalty-free around the codec because definitely the codec is open source and royalty free.

 

Dror Gill: 37:14 I think that is, is, is a big question. So much IP has gone into the development of the different MPEG standards and we know it has caused issues. Went pretty smoothly with AVC, with MPEG-LA that had kind of a single point of contact for licensing all the essential patents and with HEVC, that hasn't gone very well in the beginning. But still there is a lot of IP there. So the question is, is it even possible to have a truly royalty free codec that can be competitive in, in compression efficiency and performance with the codec developed by the standards committee?

 

Pankaj Topiwala: 37:50 I'll give you a two part answer. One because of the landscape of patents in the field of video compression which I would describe as being, you know very, very spaghetti like and patents date back to other patents.

 

Pankaj Topiwala: 38:09 And they cover most of the, the topics and the most of the, the tools used in video compression. And by the way we've looked at the AV1 and AV1 is not that different from all the other standards that we have. H.265 or VVC. There are some things that are different. By and large, it resembles the existing standards. So can it be that this animal is totally patent free? No, it cannot be that it is patent free. But patent free is not the same as royalty free. There's no question that AV1 has many, many patents, probably hundreds of patents that reach into it. The question is whether the people developing and practicing AV1 own all of those patents. That is of course, a much larger question.

 

Pankaj Topiwala: 39:07 And in fact, there has been a recent challenge to that, a group has even stood up to proclaim that they have a central IP in AV1. The net reaction from the AOM has been to develop a legal defense fund so that they're not going to budge in terms of their royalty free model. If they do. It would kill the whole project because their main thesis is that this is a world do free thing, use it and go ahead. Now, the legal defense fund then protects the members of that Alliance, jointly. Now, it's not as if the Alliance is going to indemnify you against any possible attack on IP. They can't do that because nobody can predict, you know, where somebody's IP is. The world is so large, so many patents in that we're talking not, not even hundreds and thousands, but tens of thousands of patents at least.

 

Pankaj Topiwala: 40:08 So nobody in the world has ever reviewed all of those patent. It's not possible. And the net result is that nobody can know for sure what technology might have been patented by third parties. But the point is that because such a large number of powerful companies that are also the main users of this technology, you know, people, companies like Google and Apple and Microsoft and, and Netflix and Amazon and Facebook and whatnot. These companies are so powerful. And Samsung by the way, has joined the Alliance. These companies are so powerful that you know, it would be hard to challenge them. And so in practice, the point is they can project a royalty-free technology because it would be hard for anybody to challenge it. And so that's the reality on the ground.

 

Pankaj Topiwala: 41:03 So at the moment it is succeeding as a royalty free project. I should also point out that if you want to use this, not join the Alliance, but just want to be a user. Even just to use it, you already have to offer any IP you have in this technology it to the Alliance. So all users around the world, so if tens of thousands and eventually millions of you know, users around the world, including tens of thousands of companies around the world start to use this technology, they will all have automatically yielded any IP they have in AV1, to the Alliance.

 

Dror Gill: 41:44 Wow. That's really fascinating. I mean, first the distinction you made between royalty free and patent free. So the AOM can keep this technology royalty free, even if it's not patent free because they don't charge royalties and they can help with the legal defense fund against patent claim and still keep it royalty free. And, and second is the fact that when you use this technology, you are giving up any IP claims against the creators of the technology, which means that if any, any party who wants to have any IP claims against the AV1 encoder cannot use it in any form or shape.

 

Pankaj Topiwala: 42:25 That's at least my understanding. And I've tried to look at of course I'm not a lawyer. And you have to take that as just the opinion of a video coding expert rather than a lawyer dissecting the legalities of this. But be that as it may, my understanding is that any user would have to yield any IP they have in the standard to the Alliance. And the net result will be if this technology truly does get widely used more IP than just from the Alliance members will have been folded into into it so that eventually it would be hard for anybody to challenge this.

 

Mark Donnigan: 43:09 Pankaj, what does this mean for the development of so much of the technology has been in has been enabled by the financial incentive of small groups of people, you know, or medium sized groups of people forming together. You know, building a company, usually. Hiring other experts and being able to derive some economic benefit from the research and the work and the, you know, the effort that's put in. If all of this sort of consolidates to a handful or a couple of handfuls of, you know, very, very large companies, you know, does that, I guess I'm, I'm asking from your view, will, will video and coding technology development and advancements proliferate? Will it sort of stay static? Because basically all these companies will hire or acquire, you know, all the experts and you know, it's just now everybody works for Google and Facebook and Netflix and you know... Or, or do you think it will ultimately decline? Because that's something that that comes to mind here is, you know, if the economic incentives sort of go away, well, you know, people aren't going to work for free!

 

Pankaj Topiwala: 44:29 So that's of course a, another question and a one relevant. In fact to many of us working in video compression right now, including my company. And I faced this directly back in the days of MPEG-2. There was a two and a half dollar ($2.50) per unit license fee for using MPEG-2. That created billions of dollars in licensing in fact, the patent pool, MPEG-LA itself made billions of dollars, even though they took only 10% of the proceeds, they already made billions of dollars, you know, huge amounts of money. With the advent of H.264 AVC, the patent license went not to from two and a half dollars to 25 cents a unit. And now with HEVC, it's a little bit less than that per unit. Of course the number of units has grown exponentially, but then the big companies don't continue to pay per unit anymore.

 

Pankaj Topiwala: 45:29 They just pay a yearly cap. For example, 5 million or 10 million, which to these big companies is is peanuts. So there's a yearly cap for the big companies that have, you know, hundreds of millions of units. You know imagine the number of Microsoft windows that are out there or the number of you know, Google Chrome browsers. And if you have a, a codec embedded in the browser there are hundreds of millions of them, if not billions of them. And so they just pay a cap and they're done with it. But even then, there was up till now an incentive for smart engineers to develop exciting new ideas in a future video coding. But, and that has been up the story up till now. But when, if it happens that this AOM model with AV1 and then AV2, really becomes a dominant codec and takes over the market, then there will be no incentive for researchers to devote any time and energy.

 

Pankaj Topiwala: 46:32 Certainly my company for example, can't afford to you know, just twiddle thumbs, create technologies for which there is absolutely no possibility of a royalty stream. So we, we cannot be in the business of developing video coding when video coding doesn't pay. So the only thing that makes money, is Applications, for example, a streaming application or some other such thing. And so Netflix and, and Google and Amazon will be streaming video and they'll charge you per stream but not on the codec. So that that's an interesting thing and it certainly affects the future development of video. It's clear to me it's a negative impact on the research that we got going in. I can't expect that Google and Amazon and Microsoft are going to continue to devote the same energy to develop future compression technologies in their royalty free environment that companies have in the open standards development technology environment.

 

Pankaj Topiwala: 47:34 It's hard for me to believe that they will devote that much energy. They'll devote energy, but it will not be the the same level. For example, in developing a video standards such as HEVC, it took up to 10 years of development by on the order of 500 to 600 experts, well, let's say four to 500 experts from around the world meeting four times a year for 10 years.

 

Mark Donnigan: 48:03 That is so critical. I want you to repeat that again.

 

Pankaj Topiwala: 48:07 Well, I mean so very clearly we've been putting out a video codec roughly on the schedule of once every 10 years. MPEG-2 was 1994. AVC was 2003 and also 2004. And then HEVC in 2013. Those were roughly 10 years apart. But VVC we've accelerated the schedule to put one out in seven years instead of 10 years. But even then you should realize that we had been working right since HEVC was done.

 

Pankaj Topiwala: 48:39 We've been working all this time to develop VVC and so on the order of 500 experts from around the world have met four times a year at all international locations, spending on the order of $100 million per meeting. You know so billions of dollars have been spent by industry to create these standards, many billions and it can't happen, you know without that. It's hard for me to believe that companies like Microsoft, Google, and whatnot, are going to devote billions to develop their next incremental, you know, AV1and AV2 AV3's. But maybe they will it just, that there's no royalty stream coming from the codec itself, only the application. Then the incentive, suppose they start dominating to create even better technology will not be there. So there really is a, a financial issue in this and that's at play right now.

 

Dror Gill: 49:36 Yeah, I, I find it really fascinating. And of course, Mark and I are not lawyers, but all this you know, royalty free versus committee developed open source versus a standard those large companies who some people fear, you know, their dominance and not only in video codec development, but in many other areas. You know, versus you know, dozens of companies and hundreds of engineers working for seven or 10 years in a codec. So you know, it's really different approaches different methods of development eventually to approach the exact same problem of video compression. And, and how this turns out. I mean we, we cannot forecast for sure, but it will be very interesting, especially next year in 2020 when VVC is ratified. And at around the same time, EVC is ratified another codec from the MPEG committee.

 

Dror Gill: 50:43 And then AV1, and once you know, AV1 starts hitting the market. We'll hear all the discussions of AV2. So it's gonna be really interesting and fascinating to follow. And we, we promise to to bring you all the updates here on The Video Insiders. So Pankaj I really want to thank you. This has been a fascinating discussion with very interesting insights into the world of codec development and compression and, and wavelets and DCT and and all of those topics and, and the history and the future. So thank you very much for joining us today on the video insiders.

 

Pankaj Topiwala: 51:25 It's been my pleasure, Mark and Dror. And I look forward to interacting in the future. Hope this is a useful for your audience. If I can give you a one parting thought, let me give this...

 

Pankaj Topiwala: 51:40 H.264 AVC was developed in 2003 and also 2004. That is you know, some 17 years or 16 years ago, it is close to being now nearly royalty-free itself. And if you look at the market share of video codecs currently being used in the market, for example, even in streaming AVC dominates that market completely. Even though VP8 and VP9 and VP10 were introduced and now AV1, none of those have any sizeable market share. AVC currently dominates from 70 to 80% of that marketplace right now. And it fully dominates broadcast where those other codecs are not even in play. And so they're 17, 16, 17 years later, it is now still the dominant codec even much over HEVC, which by the way is also taking an uptick in the last several years. So the standardized codecs developed by ITU and MPEG are not dead. They may just take a little longer to emerge as dominant forces.

 

Mark Donnigan: 52:51 That's a great parting thought. Thanks for sharing that. What an engaging episode Dror. Yeah. Yeah. Really interesting. I learned so much. I got a DCT primer. I mean, that in and of itself was a amazing,

 

Dror Gill: 53:08 Yeah. Yeah. Thank you.

 

Mark Donnigan: 53:11 Yeah, amazing Pankaj. Okay, well good. Well thanks again for listening to the video insiders, and as always, if you would like to come on this show, we would love to have you just send us an email. The email address is thevideoinsiders@beamr.com, and Dror or myself will follow up with you and we'd love to hear what you're doing. We're always interested in talking to video experts who are involved in really every area of video distribution. So it's not only encoding and not only codecs, whatever you're doing, tell us about it. And until next time what do we say Dror? Happy encoding! Thanks everyone.

 

Episode Transcription

Pankaj Topiwala:    00:00          With H.264 H.265 HEVC in 2013, we were now able to do up to 300 to one to up to 500 to one compression on a, let's say a 4K video. And with VVC we have truly entered a new realm where we can do up to 1000 to one compression, which is three full orders of magnitude reduction of the original size. If the original size is say 10 gigabits, we can bring that down to 10 megabits. And that's unbelievable. And so video compression truly is a remarkable technology and you know, it's a, it's a marval to look at

 

Announcer:          00:39          The Video Insiders is the show that makes sense of all that is happening in the world of online video as seen through the eyes of a second generation codec nerd and a marketing guy who knows what I-frames and macro blocks are. And here are your hosts, Mark Donnigan and Dror Gill.

 

Speaker 3:          00:39          

 

Dror Gill:          01:11          Today we're going to talk with one of the key figures in the development of a video codecs and a true video insider Pankaj Topiwala. Hello Pankaj and welcome to The Video Insiders podcast.

 

Pankaj Topiwala:    01:24          Gentlemen. hello, and thank you very much for this invite. It looks like it's going to be a lot of fun.

 

Mark Donnigan:      01:31          It is. Thank you for joining Pankaj.

 

Dror Gill:          01:33          Yeah, it sure will be a lot of fun. So can you start by telling us a little bit about your experience in codec development?

 

Pankaj Topiwala:    01:41          Sure, so, I should say that unlike a number of the other people that you have interviewed or may interview my background is fair bit different. I really came into this field really by a back door and almost by chance my degree PhD degree is actually in mathematical physics from 1985. And I actually have no engineering, computer science or even management experience. So naturally I run a small research company working in video compression and analytics, and that makes sense, but that's just the way things go in the modern world. But that the effect for me was a, and the entry point was that even though I was working in very, very abstract mathematics I decided to leave. I worked in academia for a few years and then I decided to join industry. And at that point they were putting me into applied mathematical research.

 

Pankaj Topiwala:    02:44          And the topic at that time that was really hot in applied mathematics was a topic of wavelets. And I ended up writing and edited a book called wavelet image and video compression in 1998. Which was a lot of fun along with quite a few other co authors on that book. But, wavelets had its biggest contribution in the compression of image and video. And so that led me finally to enter into, and I noticed that video compression was a far larger field than image compression. I mean, by many orders, by orders of magnitude. It is probably a hundred times bigger in terms of market size than, than image compression. And as a result I said, okay, if the sexiest application of this new fangled mathematics could be in video compression I entered that field roughly with the the book that I mentioned in 1998.

 

Mark Donnigan:      03:47          So one thing that I noticed Pankaj cause it's really interesting is your, your initial writing and you know, research was around wavelet compression and yet you have been very active in ISO MPEG, all block-based codecs. So, so tell us about that?

 

Pankaj Topiwala:    04:08          Okay. Well obviously you know when you make the transition from working on the wavelets and our initial starting point was in doing wavelet based video compression. When I started first founded my company fastVDO in 1998, 1999 period we were working on wavelet based video compression and we, we pushed that about as much as we could. And at that, at one point we had what we felt was the world's best a video compression using wavelets in fact, but best overall. And it had the feature that you know, one thing that we should, we should tell your view or reader listeners is that the, the value of wavelets in particular in image coding is that not only can you do state of the art image coding, but you can make the bitstream what is called embedded, meaning you can chop it off at anywhere you like, and it's still a decodable stream.

 

Pankaj Topiwala:    05:11          And in fact it is the best quality you can get for that bit rate. And that is a powerful, powerful thing you can do in image coding. Now in video, there is actually no way to do that. Video is just so much more complicated, but we did the best we could to make it not embedded, but at least scalable. And we, we built a scalable wavelet based video codec, which at that time was beating at the current implementations of MPEG4. So we were very excited that we could launch a company based on a proprietary codec that was based on this new fangled mathematics called wavelets. And lead us to a state of the art codec. The facts of the ground though is that just within the first couple of years of running our company, we found that in fact the block-based transformed codecs that everybody else was using, including the implementers of MPEG4.

 

Pankaj Topiwala:    06:17          And then later AVC, those quickly surpassed anything we could build with with wavelets in terms of both quality and stability. The wavelet based codecs were not as powerful or as stable. And I can say quite a bit more about why that's true. If you want?

 

Dror Gill:          06:38          So when you talk about stability, what exactly are you referring to in, in a video codec?

 

Pankaj Topiwala:    06:42          Right. So let's let's take our listeners back a bit to compare image coding and video coding. Image coding is basically, you're given a set of pixels in a rectangular array and we normally divide that into blocks of sub blocks of that image. And then do transforms and then quantization and than entropy coding, that's how we typically do image coding. With the wavelet transform, we have a global transform. It's a, it's ideally done on the entire image.

 

Pankaj Topiwala:    07:17          And then you could do it multiple times, what are called multiple scales of the wavelet transform. So you could take various sub sub blocks that you create by doing the wavelet transfer and the low pass high pass. Ancs do that again to the low low pass for multiple scales, typically about four or five scales that are used in popular image codecs that use wavelets. But now in video, the novelty is that you don't have one frame. You have many, many frames, hundreds or thousands or more. And you have motion. Now, motion is something where you have pieces of the image that float around from one frame to another and they float randomly. That is, it's not as if all of the motion is in one direction. Some things move one way, some things move other ways, some things actually change orientations.

 

Pankaj Topiwala:    08:12          And they really move, of course, in three dimensional space, not in our two dimensional space that we capture. That complicates video compression enormously over image compression. And it particularly complicates all the wavelet methods to do video compression. So, wavelet methods that try to deal with motion were not very successful. The best we tried to do was using motion compensated video you know, transformed. So doing wavelet transforms in the time domain as well as the spatial domain along the paths of motion vectors. But that was not very successful. And what I mean by stability is that as soon as you increase the motion, the codec breaks, whereas in video coding using block-based transforms and block-based motion estimation and compensation it doesn't break. It just degrades much more gracefully. Wavelet based codecs do not degrade gracefully in that regard.

 

Pankaj Topiwala:    09:16          And so we of course, as a company we decided, well, if those are the facts on the ground. We're going to go with whichever way video coding is going and drop our initial entry point, namely wavelets, and go with the DCT. Now one important thing we found was that even in the DCT- ideas we learned in wavelets can be applied right to the DCT. And I don't know if you're familiar with this part of the story, but a wavelet transform can be decomposed using bits shifts and ads only using something called the lifting transform, at least a important wavelet transforms can. Now, it turns out that the DCT can also be decomposed using lifting transforms using only bit shifts and ads. And that is something that my company developed way back back in 1998 actually.

 

Pankaj Topiwala:    10:18          And we showed that not only for DCT, but a large class of transforms called lab transforms, which included the block transforms, but in particular included more powerful transforms the importance of that in the story of video coding. Is that up until H.264, all the video codec. So H.261, MPEG-1, MPEG-2, all these video codecs used a floating point implementation of the discrete cosign transform and without requiring anybody to implement you know a full floating point transform to a very large number of decimal places. What they required then was a minimum accuracy to the DCT and that became something that all codecs had to do. Instead. If you had an implementation of the DCT, it had to be accurate to the true floating point DCT up to a certain decimal point in, in the transform accuracy.

 

Pankaj Topiwala:    11:27          With the advent of H.264, with H.264, we decided right away that we were not going to do a flooding point transform. We were going to do an integer transform. That decision was made even before I joined, my company joined, the development base, H.264, AVC, But they were using 32 point transforms. We found that we could introduce 16 point transforms, half the complexity. And half the complexity only in the linear dimension when you, when you think of it as a spatial dimension. So two spatial dimensions, it's a, it's actually grows more. And so the reduction in complexity is not a factor of two, but at least a factor of four and much more than that. In fact, it's a little closer to exponential. The reality is that we were able to bring the H.264 codec.

 

Pankaj Topiwala:    12:20          So in fact, the transform was the most complicated part of the entire codec. So if you had a 32 point transform, the entire codec was at 32 point technology and it needed 32 points, 32 bits at every sample to process in hardware or software. By changing the transform to 16 bits, we were able to bring the entire codec to a 16 bit implementation, which dramatically improved the hardware implementability of this transfer of this entire codec without at all effecting the quality. So that was an important development that happened with AVC. And since then, we've been working with only integer transforms.

 

Mark Donnigan:      13:03          This technical history is a really amazing to hear. I, I didn't actually know that Dror or you, you probably knew that, but I didn't.

 

Dror Gill:          13:13          Yeah, I mean, I knew about the transform and shifting from fixed point, from a floating point to integer transform. But you know, I didn't know that's an incredible contribution Pankaj.

 

Pankaj Topiwala:    13:27          We like to say that we've saved the world billions of dollars in hardware implementations. And we've taken a small a small you know, a donation as a result of that to survive as a small company.

 

Dror Gill:          13:40          Yeah, that's great. And then from AVC you moved on and you continued your involvement in, in the other standards, right? That's followed.

 

Pankaj Topiwala:    13:47          in fact, we've been involved in standardization efforts now for almost 20 years. My first meeting was a, I recall in may of 2000, I went to a an MPEG meeting in Geneva. And then shortly after that in July I went to an ITU VCEG meeting. VCEG is the video coding experts group of the ITU. And MPEG is the moving picture experts group of ISO. These two organizations were separately pursuing their own codecs at that time.

 

Pankaj Topiwala:    14:21          ISO MPEG was working on MPEG-4 and ITU VCEG was working on H.263, and 263 plus and 263 plus plus. And then finally they started a project called 263 L for longterm. And eventually it became clear to these two organizations that look, it's silly to work on, on separate codecs. They had worked once before in MPEG-2 develop a joint standard and they decided to, to form a joint team at that time called the joint video team, JVT to develop the H.264 AVC video codec, which was finally done in 2003. We participate participated you know fully in that making many contributions of course in the transform but also in motion estimation and other aspects. So, for example, it might not be known that we also contributed the fast motion estimation that's now widely used in probably nearly all implementations of 264, but in 265 HEVC as well.

 

Pankaj Topiwala:    15:38          And we participated in VVC. But one of the important things that we can discuss is these technologies, although they all have the same overall structure, they have become much more complicated in terms of the processing that they do. And we can discuss that to some extent if you want?

 

Dror Gill:          15:59          The compression factors, just keep increasing from generation to generation and you know, we're wondering what's the limit of that?

 

Pankaj Topiwala:    16:07          That's of course a very good question and let me try to answer some of that. And in fact that discussion I don't think came up in the discussion you had with Gary Sullivan, which certainly could have but I don't recall it in that conversation. So let me try to give for your listeners who did not catch that or are not familiar with it. A little bit of the story.

 

Pankaj Topiwala:    16:28          The first international standard was the ITU. H.261 standard dating roughly to 1988 and it was designed to do only about 15 to one to 20 to one compression. And it was used mainly for video conferencing. And at that time you'd be surprised from our point of view today, the size of the video being used was actually incredibly tiny about QCIP or 176 by 144 pixels. Video of that quality that was the best we could conceive. And we thought we were doing great. And doing 20 to one compression, wow! Recall by the way, that if you try to do a lossless compression of any natural signal, whether it's speech or audio or images or video you can't do better than about two to one or at most about two and a half to one.

 

Pankaj Topiwala:    17:25          You cannot do, typically you cannot even do three to one and you definitely cannot do 10 to one. So a video codec that could do 20 to one compression was 10 times better than what you could do lossless, I'm sorry. So this is definitely lossy, but lossy with still a good quality so that you can use it. And so we thought we were really good. When MPEG-1 came along in, in roughly 1992 we were aiming for 25 to one compression and the application was the video compact disc, the VCD. With H.262 or MPEG-2 roughly 1994, we were looking to do about 35 to one compression, 30 to 35. And the main application was then DVD or also broadcast television. At that point, broadcast television was ready to use at least in some, some segments.

 

Pankaj Topiwala:    18:21          Try digital broadcasting. In the United States, that took a while. But in any case it could be used for broadcast television. And then from that point H.264 AVC In 2003, we jumped right away to more than 100 to one compression. This technology at least on large format video can be used to shrink the original size of a video by more than two orders of magnitude, which was absolutely stunning. You know no other natural signal, not speech, not broadband, audio, not images could be compressed that much and still give you high quality subjective quality. But video can because it's it is so redundant. And because we don't understand fully yet how to appreciate video. Subjectively. We've been trying things you know, ad hoc. And so the entire development of video coding has been really by ad hoc methods to see what quality we can get.

 

Pankaj Topiwala:    19:27          And by quality we been using two two metrics. One is simply a mean square error based metric called peak signal to noise ratio or PSNR. And that has been the industry standard for the last 35 years. But the other method is simply to have people look at the video, what we call subjective rating of the video. Now it's hard to get a subjective rating. That's reliable. You have to do a lot of standardization get a lot of different people and take mean opinion scores and things like that. That's expensive. Whereas PSNR is something you can calculate on a computer. And so people have mostly in the development of video coding for 35 years relied on one objective quality metric called PSNR. And it is good but not great. And it's been known right from the beginning that it was not perfect, not perfectly correlated to video quality, and yet we didn't have anything better anyway.

 

Pankaj Topiwala:    20:32          To finish the story of the video codecs with H.265 HEVC in 2013, we were now able to do up to 300 to one to up to 500 to one compression on let's say a 4K. And with VVC we have truly entered a new realm where we can do up to 1000 to one compression, which is three full orders of magnitude reduction of the original size. If the original size is say, 10 gigabits, we can bring that down to 10 megabits. And that's unbelievable. And so video compression truly is a remarkable technology. And you know, it's a, it's a marvel to look at. Of course it does not, it's not magic. It comes with an awful lot of processing and an awful lot of smarts have gone into it. That's right.

 

Mark Donnigan:      21:24          You know Pankaj, that, is an amazing overview and to hear that that VVC is going to be a thousand to one. You know, compression benefit. Wow. That's incredible!

 

Pankaj Topiwala:    21:37          I think we should of course we should of course temper that with you know, what people will use in applications. Correct. They may not use the full power of a VVC and may not crank it to that level. Sure, sure. I can certainly tell you that that we and many other companies have created bitstreams with 1000 to one or more compression and seeing video quality that we thought was usable.

 

Mark Donnigan:      22:07          One of the topics that has come to light recently and been talked about quite a bit. And it was initially raised by Dave Ronca who used to lead encoding at Netflix for like 10 years. In fact you know, I think he really built that department, the encoding team there and is now at Facebook. And he wrote a LinkedIn article post that was really fascinating. And what he was pointing out in this post was, was that with compression efficiency and as each generation of codec is getting more efficient as you just explained and gave us an overview. There's a, there's a problem that's coming with that in that each generation of codec is also getting even more complex and you know, in some settings and, and I suppose you know, Netflix is maybe an example where you know, it's probably not accurate to say they have unlimited compute, but their application is obviously very different in terms of how they can operate their, their encoding function compared to someone who's doing live, live streaming for example, or live broadcast. Maybe you can share with us as well. You know, through the generation generational growth of these codecs, how has the, how has the compute requirements also grown and has it grown in sort of a linear way along with the compression efficiency? Or are you seeing, you know, some issues with you know, yes, we can get a thousand to one, but our compute efficiency is getting to the, where we could be hitting a wall.

 

Pankaj Topiwala:    23:46          You asked a good question. Has the complexity only scaled linearly with the compression ratio? And the answer is no. Not at all. Complexity has outpaced the compression ratio. Even though the compression ratio is, is a tremendous, the complexity is much, much higher. And has always been at every step. First of all there's a big difference in doing the research, the research phase in development of the, of a technology like VVC where we were using a standardized reference model that the committee develops along the way, which is not at all optimized. But that's what we all use because we share a common code base. And make any new proposals based on modifying that code base. Now that code base is always along the entire development chain has always been very, very slow.

 

Pankaj Topiwala:    24:42          And true implementations are anywhere from 100 to 500 times more efficient in complexity than the reference software. So right away you can have the reference software for say VVC and somebody developing a, an implementation that's a real product. It can be at least 100 times more efficient than what the reference software, maybe even more. So there's a big difference. You know, when we're developing a technology, it is very hard to predict what implementers will actually come up with later. Of course, the only way they can do that is that companies actually invest the time and energy right away as they're developing the standard to build prototype both software and hardware and have a good idea that when they finish this, you know, what is it going to really cost? So just to give you a, an idea, between, H.264 and

 

Pankaj Topiwala:    25:38          H.265, H.264, only had two transforms of size, four by four and eight by eight. And these were integer transforms, which are only bit shifts and adds, took no multiplies and no divides. The division in fact got incorporated into the quantizer and as a result, it was very, very fast. Moreover, if you had to do, make decisions such as inter versus intra mode, the intra modes there were only about eight or 10 intra modes in H.264. By contrast in H.265. We have not two transforms eight, four by four and eight by, but in fact sizes of four, eight, 16 and 32. So we have much larger sized transforms and instead of a eight or 10 intra modes, we jumped up to 35 intra modes.

 

Pankaj Topiwala:    26:36          And then with a VVC we jumped up to 67 intro modes and we just, it just became so much more complex. The compression ratio between HEVC and VVC is not quite two to one, but let's say, you know, 40% better. But the the complexity is not 40% more. On the ground and nobody has yet, to my knowledge, built a a, a, a fully compliant and powerful either software or hardware video codec for VVC yet because it's not even finished yet. It's going to be finished in July 2020. When it, when, the dust finally settles maybe four or five years from now, it will be, it will prove to be at least three or four times more complex than HEVC encoder the decoder, not that much. The decoder, luckily we're able to build decoders that are much more linear than the encoder.

 

Pankaj Topiwala:    27:37          So I guess I should qualify as discussion saying the complexity growth is all mostly been in the encoder. The decoder has been a much more reasonable. Remember, we are always relying on this principle of ever-increasing compute capability. You know, a factor of two every 18 months. We've long heard about all of this, you know, and it is true, Moore's law. If we did not have that, none of this could have happened. None of this high complexity codecs, whatever had been developed because nobody would ever be able to implement them. But because of Moore's law we can confidently say that even if we put out this very highly complex VVC standard, someday and in the not too distant future, people will be able to implement this in hardware. Now you also asked a very good question earlier, is there a limit to how much we can compress?

 

Pankaj Topiwala:    28:34          And also one can ask relatively in this issue, is there a limit to a Moore's law? And we've heard a lot about that. That may be finally after decades of the success of Moore's law and actually being realized, maybe we are now finally coming to quantum mechanical limits to you know how much we can miniaturize in electronics before we actually have to go to quantum computing, which is a totally different you know approach to doing computing because trying to go smaller die size. Well, we'll make it a unstable quantum mechanically. Now the, it appears that we may be hitting a wall eventually we haven't hit it yet, but we may be close to a, a physical limit in die size. And in the observations that I've been making at least it seems possible to me that we are also reaching a limit to how much we can compress video even without a complexity limit, how much we can compress video and still obtain reasonable or rather high quality.

 

Pankaj Topiwala:    29:46          But we don't know the answer to that. And in fact there are many many aspects of this that we simply don't know. For example, the only real arbiter of video quality is subjective testing. Nobody has come up with an objective video quality metric that we can rely on. PSNR is not it. When, when push comes to shove, nobody in this industry actually relies on PSNR. They actually do subjective testing well. So in that scenario, we don't know what the limits of visual quality because we don't understand human vision, you know, we try, but human vision is so complicated. Nobody can understand the impact of that on video quality to any very significant extent. Now in fact, the first baby steps to try to understand, not explicitly but implicitly capture subjective human video quality assessment into a neural model. Those steps are just now being taken in the last couple of years. In fact, we've been involved, my company has been involved in, in getting into that because I think that's a very exciting area.

 

Dror Gill:          30:57          I tend to agree that modeling human perception with a neural network seems more natural than, you know, just regular formulas and algorithms which are which are linear. Now I, I wanted to ask you about this process of, of creating the codecs. It's, it's very important to have standards. So you encode a video once and then you can play it anywhere and anytime and on any device. And for this, the encoder and decoder need to agree on exactly the format of the video. And traditionally you know, as you pointed out with all the history of, of development. Video codecs have been developed by standardization bodies, MPEG and ITU first separately. And then they joined forces to develop the newest video standards. But recently we're seeing another approach to develop codecs, which is by open sourcing them.

 

Dror Gill:          31:58          Google started with an open source code, they called VP9 which they first developed internally. Then they open sourced it and and they use it widely across their services, especially in, YouTube. And then they joined forces with the, I think the largest companies in the world, not just in video but in general. You know those large internet giants such as Amazon and Facebook and and Netflix and even Microsoft, Apple, Intel have joined together with the Alliance of Open Media to jointly create another open codec called AV1. And this is a completely parallel process to the MPEG codec development process. And the question is, do you think that this was kind of a one time effort to, to to try and find a, or develop a royalty free codec, or is this something that will continue? And how do you think the adoption of the open source codecs versus the committee defined codecs, how would that adoption play out in the market?

 

Pankaj Topiwala:    33:17          That's of course a large topic on its own. And I should mention that there have been a number of discussions about that topic. In particular at the SPIE conference last summer in San Diego, we had a panel discussion of experts in video compression to discuss exactly that. And one of the things we should provide to your listeners is a link to that captured video of the panel discussion where that topic is discussed to some significant extent. And it's on YouTube so we can provide a link to that. My answer. And of course none of us knows the future. Right. But we're going to take our best guesses. I believe that this trend will continue and is a new factor in the landscape of video compression development.

 

Pankaj Topiwala:    34:10          But we should also point out that the domain of preponderance use preponderant use of these codecs is going to be different than in our traditional codecs. Our traditional codecs such as H.264 265, were initially developed for primarily for the broadcast market or for DVD and Blu-ray. Whereas these new codecs from AOM are primarily being developed for the streaming media industry. So the likes of Netflix and Amazon and for YouTube where they put up billions of user generated videos. So, for the streaming application, the decoder is almost always a software decoder. That means they can update that decoder anytime they do a software update. So they're not limited by a hardware development cycle. Of course, hardware companies are also building AV1.

 

Pankaj Topiwala:    35:13          And the point of that would be to try to put it into handheld devices like laptops, tablets, and especially smartphones. But to try to get AV1 not only as a decoder but also as an encoder in a smartphone is going to be quite complicated. And the first few codecs that come out in hardware will be of much lower quality, for example, comparable to AVC and not even the quality of HEVC when they first start out. So that's... the hardware implementations of AV1 that work in real time are not going to be, it's going to take a while for them to catch up to the quality that AV1 can offer. But for streaming we, we can decode these streams reasonably well in software or in firmware. And the net result is that, or in GPU for example, and the net result is that these companies can already start streaming.

 

Pankaj Topiwala:    36:14          So in fact Google is already streaming some test streams maybe one now. And it's cloud-based YouTube application and companies like Cisco are testing it already, even for for their WebEx video communication platform. Although the quality will not be then anything like the full capability of AV1, it'll be at a much reduced level, but it'll be this open source and notionally, you know, royalty free video codec.

 

Dror Gill:          36:50          Notionally. Yeah. Because they always tried to do this, this dance and every algorithm that they try to put into the standard is being scrutinized and, and, and they check if there are any patents around it so they can try and keep this notion of of royalty-free around the codec because definitely the codec is open source and royalty free.

 

Dror Gill:          37:14          I think that is, is, is a big question. So much IP has gone into the development of the different MPEG standards and we know it has caused issues. Went pretty smoothly with AVC, with MPEG-LA that had kind of a single point of contact for licensing all the essential patents and with HEVC, that hasn't gone very well in the beginning. But still there is a lot of IP there. So the question is, is it even possible to have a truly royalty free codec that can be competitive in, in compression efficiency and performance with the codec developed by the standards committee?

 

Pankaj Topiwala:    37:50          I'll give you a two part answer. One because of the landscape of patents in the field of video compression which I would describe as being, you know very, very spaghetti like and patents date back to other patents.

 

Pankaj Topiwala:    38:09          And they cover most of the, the topics and the most of the, the tools used in video compression. And by the way we've looked at the AV1 and AV1 is not that different from all the other standards that we have. H.265 or VVC. There are some things that are different. By and large, it resembles the existing standards. So can it be that this animal is totally patent free? No, it cannot be that it is patent free. But patent free is not the same as royalty free. There's no question that AV1 has many, many patents, probably hundreds of patents that reach into it. The question is whether the people developing and practicing AV1 own all of those patents. That is of course, a much larger question.

 

Pankaj Topiwala:    39:07          And in fact, there has been a recent challenge to that, a group has even stood up to proclaim that they have a central IP in AV1. The net reaction from the AOM has been to develop a legal defense fund so that they're not going to budge in terms of their royalty free model. If they do. It would kill the whole project because their main thesis is that this is a world do free thing, use it and go ahead. Now, the legal defense fund then protects the members of that Alliance, jointly. Now, it's not as if the Alliance is going to indemnify you against any possible attack on IP. They can't do that because nobody can predict, you know, where somebody's IP is. The world is so large, so many patents in that we're talking not, not even hundreds and thousands, but tens of thousands of patents at least.

 

Pankaj Topiwala:    40:08          So nobody in the world has ever reviewed all of those patent. It's not possible. And the net result is that nobody can know for sure what technology might have been patented by third parties. But the point is that because such a large number of powerful companies that are also the main users of this technology, you know, people, companies like Google and Apple and Microsoft and, and Netflix and Amazon and Facebook and whatnot. These companies are so powerful. And Samsung by the way, has joined the Alliance. These companies are so powerful that you know, it would be hard to challenge them. And so in practice, the point is they can project a royalty-free technology because it would be hard for anybody to challenge it. And so that's the reality on the ground.

 

Pankaj Topiwala:    41:03          So at the moment it is succeeding as a royalty free project. I should also point out that if you want to use this, not join the Alliance, but just want to be a user. Even just to use it, you already have to offer any IP you have in this technology it to the Alliance. So all users around the world, so if tens of thousands and eventually millions of you know, users around the world, including tens of thousands of companies around the world start to use this technology, they will all have automatically yielded any IP they have in AV1, to the Alliance.

 

Dror Gill:          41:44          Wow. That's really fascinating. I mean, first the distinction you made between royalty free and patent free. So the AOM can keep this technology royalty free, even if it's not patent free because they don't charge royalties and they can help with the legal defense fund against patent claim and still keep it royalty free. And, and second is the fact that when you use this technology, you are giving up any IP claims against the creators of the technology, which means that if any, any party who wants to have any IP claims against the AV1 encoder cannot use it in any form or shape.

 

Pankaj Topiwala:    42:25          That's at least my understanding. And I've tried to look at of course I'm not a lawyer. And you have to take that as just the opinion of a video coding expert rather than a lawyer dissecting the legalities of this. But be that as it may, my understanding is that any user would have to yield any IP they have in the standard to the Alliance. And the net result will be if this technology truly does get widely used more IP than just from the Alliance members will have been folded into into it so that eventually it would be hard for anybody to challenge this.

 

Mark Donnigan:      43:09          Pankaj, what does this mean for the development of so much of the technology has been in has been enabled by the financial incentive of small groups of people, you know, or medium sized groups of people forming together. You know, building a company, usually. Hiring other experts and being able to derive some economic benefit from the research and the work and the, you know, the effort that's put in. If all of this sort of consolidates to a handful or a couple of handfuls of, you know, very, very large companies, you know, does that, I guess I'm, I'm asking from your view, will, will video and coding technology development and advancements proliferate? Will it sort of stay static? Because basically all these companies will hire or acquire, you know, all the experts and you know, it's just now everybody works for Google and Facebook and Netflix and you know... Or, or do you think it will ultimately decline? Because that's something that that comes to mind here is, you know, if the economic incentives sort of go away, well, you know, people aren't going to work for free!

 

Pankaj Topiwala:    44:29          So that's of course a, another question and a one relevant. In fact to many of us working in video compression right now, including my company. And I faced this directly back in the days of MPEG-2. There was a two and a half dollar ($2.50) per unit license fee for using MPEG-2. That created billions of dollars in licensing in fact, the patent pool, MPEG-LA itself made billions of dollars, even though they took only 10% of the proceeds, they already made billions of dollars, you know, huge amounts of money. With the advent of H.264 AVC, the patent license went not to from two and a half dollars to 25 cents a unit. And now with HEVC, it's a little bit less than that per unit. Of course the number of units has grown exponentially, but then the big companies don't continue to pay per unit anymore.

 

Pankaj Topiwala:    45:29          They just pay a yearly cap. For example, 5 million or 10 million, which to these big companies is is peanuts. So there's a yearly cap for the big companies that have, you know, hundreds of millions of units. You know imagine the number of Microsoft windows that are out there or the number of you know, Google Chrome browsers. And if you have a, a codec embedded in the browser there are hundreds of millions of them, if not billions of them. And so they just pay a cap and they're done with it. But even then, there was up till now an incentive for smart engineers to develop exciting new ideas in a future video coding. But, and that has been up the story up till now. But when, if it happens that this AOM model with AV1 and then AV2, really becomes a dominant codec and takes over the market, then there will be no incentive for researchers to devote any time and energy.

 

Pankaj Topiwala:    46:32          Certainly my company for example, can't afford to you know, just twiddle thumbs, create technologies for which there is absolutely no possibility of a royalty stream. So we, we cannot be in the business of developing video coding when video coding doesn't pay. So the only thing that makes money, is Applications, for example, a streaming application or some other such thing. And so Netflix and, and Google and Amazon will be streaming video and they'll charge you per stream but not on the codec. So that that's an interesting thing and it certainly affects the future development of video. It's clear to me it's a negative impact on the research that we got going in. I can't expect that Google and Amazon and Microsoft are going to continue to devote the same energy to develop future compression technologies in their royalty free environment that companies have in the open standards development technology environment.

 

Pankaj Topiwala:    47:34          It's hard for me to believe that they will devote that much energy. They'll devote energy, but it will not be the the same level. For example, in developing a video standards such as HEVC, it took up to 10 years of development by on the order of 500 to 600 experts, well, let's say four to 500 experts from around the world meeting four times a year for 10 years.

 

Mark Donnigan:      48:03          That is so critical. I want you to repeat that again.

 

Pankaj Topiwala:    48:07          Well, I mean so very clearly we've been putting out a video codec roughly on the schedule of once every 10 years. MPEG-2 was 1994. AVC was 2003 and also 2004. And then HEVC in 2013. Those were roughly 10 years apart. But VVC we've accelerated the schedule to put one out in seven years instead of 10 years. But even then you should realize that we had been working right since HEVC was done.

 

Pankaj Topiwala:    48:39          We've been working all this time to develop VVC and so on the order of 500 experts from around the world have met four times a year at all international locations, spending on the order of $100 million per meeting. You know so billions of dollars have been spent by industry to create these standards, many billions and it can't happen, you know without that. It's hard for me to believe that companies like Microsoft, Google, and whatnot, are going to devote billions to develop their next incremental, you know, AV1and AV2 AV3's. But maybe they will it just, that there's no royalty stream coming from the codec itself, only the application. Then the incentive, suppose they start dominating to create even better technology will not be there. So there really is a, a financial issue in this and that's at play right now.

 

Dror Gill:          49:36          Yeah, I, I find it really fascinating. And of course, Mark and I are not lawyers, but all this you know, royalty free versus committee developed open source versus a standard those large companies who some people fear, you know, their dominance and not only in video codec development, but in many other areas. You know, versus you know, dozens of companies and hundreds of engineers working for seven or 10 years in a codec. So you know, it's really different approaches different methods of development eventually to approach the exact same problem of video compression. And, and how this turns out. I mean we, we cannot forecast for sure, but it will be very interesting, especially next year in 2020 when VVC is ratified. And at around the same time, EVC is ratified another codec from the MPEG committee.

 

Dror Gill:          50:43          And then AV1, and once you know, AV1 starts hitting the market. We'll hear all the discussions of AV2. So it's gonna be really interesting and fascinating to follow. And we, we promise to to bring you all the updates here on The Video Insiders. So Pankaj I really want to thank you. This has been a fascinating discussion with very interesting insights into the world of codec development and compression and, and wavelets and DCT and and all of those topics and, and the history and the future. So thank you very much for joining us today on the video insiders.

 

Pankaj Topiwala:    51:25          It's been my pleasure, Mark and Dror. And I look forward to interacting in the future. Hope this is a useful for your audience. If I can give you a one parting thought, let me give this...

 

Pankaj Topiwala:    51:40          H.264 AVC was developed in 2003 and also 2004. That is you know, some 17 years or 16 years ago, it is close to being now nearly royalty-free itself. And if you look at the market share of video codecs currently being used in the market, for example, even in streaming AVC dominates that market completely. Even though VP8 and VP9 and VP10 were introduced and now AV1, none of those have any sizeable market share. AVC currently dominates from 70 to 80% of that marketplace right now. And it fully dominates broadcast where those other codecs are not even in play. And so they're 17, 16, 17 years later, it is now still the dominant codec even much over HEVC, which by the way is also taking an uptick in the last several years. So the standardized codecs developed by ITU and MPEG are not dead. They may just take a little longer to emerge as dominant forces.

 

Mark Donnigan:      52:51          That's a great parting thought. Thanks for sharing that. What an engaging episode Dror. Yeah. Yeah. Really interesting. I learned so much. I got a DCT primer. I mean, that in and of itself was a amazing,

 

Dror Gill:          53:08          Yeah. Yeah. Thank you.

 

Mark Donnigan:      53:11          Yeah, amazing Pankaj. Okay, well good. Well thanks again for listening to the video insiders, and as always, if you would like to come on this show, we would love to have you just send us an email. The email address is thevideoinsiders@beamr.com, and Dror or myself will follow up with you and we'd love to hear what you're doing. We're always interested in talking to video experts who are involved in really every area of video distribution. So it's not only encoding and not only codecs, whatever you're doing, tell us about it. And until next time what do we say Dror? Happy encoding! Thanks everyone.