AOMedia Industry Workshop Sponsored by Google and Meta

In this joint workshop, we will first present the recent progress in enabling AV1 deployment, developing the next generation Alliance for Open Media (AOM) video coding standard and software implementation of the AOM encoders and other related activities. The second part of the workshop will focus on various topics from real production use cases at Google and Meta, including image compression, AV1 enabling for RTC use case, quality metrics, and Machine Learning for video compression, etc.

Organizers:

  • Ioannis Katsavounidis, Research Scientist, Video Infrastructure, Meta
  • Ryan Lei, Video Codec Specialist, Video Infrastructure, Meta
  • Debargha Mukherjee, Principal Engineer, Google
  • In Suk Chong, Video Codec Lead, Google
  • Balu Adsumilli, Media Algorithms, Google
  • Shan Liu, Distinguished Scientist, Tencent

Speakers

  • Ioannis Katsavounidis (Meta)
  • Hassene Tmar (Meta)
  • Ryan Lei (Meta)
  • In Suk Chong (YouTube/Google)
  • Yilin Wang (YouTube/Google)
  • Balu Adsumili (YouTube/Google)
  • Debargha Mukherjee (Google)
  • Onur Guleryuz (Google)
  • Yu-Chen (Eric) Sun (Meta)
  • Zuzanna Mroczek (Meta)
  • Erik Andre (Meta)
  • Van Luong Pham (Apple)
  • Xin Zhao (Apple)
  • Yeqing Wu (Apple)
  • Liang Zhao (Tencent)
  • Madhu P Krishnan (Tencent)
  • Dr. Shan Liu (Tencent)

Dr. Ioannis Katsavounidis is part of the Video Infrastructure team, leading technical efforts in improving video quality and quality of experience across all video products at Meta. Before joining Meta, he spent 3.5 years at Netflix, contributing to the development and popularization of VMAF, Netflix’s open-source video quality metric, as well as inventing the Dynamic Optimizer, a shot-based perceptual video quality optimization framework that brought significant bitrate savings across the whole video streaming spectrum. He was a professor for 8 years at the University of Thessaly’s Electrical and Computer Engineering Department in Greece, teaching video compression, signal processing and information theory. He was one of the cofounders of Cidana, a mobile multimedia software company in Shanghai, China. He was the director of software for advanced video codecs at InterVideo, the makers of the popular SW DVD player, WinDVD, in the early 2000’s and he has also worked for 4 years in high-energy experimental Physics in Italy. He is one of the co-chairs for the statistical analysis methods (SAM) and no-reference metrics (NORM) groups at the Video Quality Experts Group (VQEG). He is actively involved within the Alliance for Open Media (AOMedia) as co-chair of the software implementation working group (SIWG). He has over 150 publications, including 50 patents. His research interests lie in video coding, quality of experience, adaptive streaming, and energy efficient HW/SW multimedia processing.

Dr. Debargha Mukherjee received his M.S./Ph.D. degrees in ECE from University of California Santa Barbara in 1999. Since 2010 he has been with Google LLC, where he is currently a Principal Engineer/Director leading next generation video codec research and development efforts. Prior to that he was with Hewlett Packard Laboratories, conducting research on video/image coding and processing. Debargha has made extensive research contributions in the area of image and video compression throughout his career, and was elected to IEEE Fellow for leadership in standard development for the video-streaming industry. He has (co-)authored more than 120 papers on various signal processing topics, and holds more than 200 US patents, with many more pending. He currently serves as a Senior Area Editor of the IEEE Trans. on Image Processing, and as a member of the IEEE Visual Signal Processing and CommunicationsTechnical Committee (VSPC-TC).

Dr. Balu Adsumilli is currently the Head of Media Algorithms group at YouTube/Google, leading transcoding infrastructure, audio/video quality, and media innovation at YouTube. Prior to this, he led the Advanced Technology group and the Camera Architecture group at GoPro. He received his masters at the University of Wisconsin Madison, and his PhD at the University of California Santa Barbara. He has co-authored more than 120 papers and 100 granted patents with many more pending. He serves on the board of the Television Academy, on the board of NATAS Technical committee, on the board of Visual Effects Society, on the IEEE MMSP Technical Committee, and on ACM MHV Steering Committee. He is on TPCs and organizing committees for various conferences and workshops, and currently serves as Associate Editor for IEEE Transactions on Multimedia (T-MM). His fields of research include image/video processing, audio and video quality, video compression and transcoding, video ML/AI models, Generative AI, and related areas.

Dr. Shan Liu is a Distinguished Scientist and General Manager at Tencent, where she leads global R&D teams to develop technologies and products serving billion users worldwide. She was formerly Director of Media Technology Division at MediaTek USA. She was also formerly with MERL and Sony, etc. She has been a long-time contributor to international standards, including VVC, HEVC, OMAF, DASH, MMT, PCC, etc. and served as a Project Editor of H.266/VVC. She is currently a WG Chair of AOMedia VVM, Vice Chair of IEEE DCSC and Associate Editor-in-Chief of IEEE TCSVT. She has served on IEEE SPS Industrial Relationship Committee, IEEE CS Standards Activities Board, IEEE MMSP-TC, IEEE VSPC-TC and a few other boards and committees.  Dr. Liu is a Fellow of IEEE and IET. She has (co-)authored more than 100 papers and one book, and holds more than 500 US patents.  She received the B.Eng. degree in electronic engineering from Tsinghua University, the M.S. and Ph.D. degrees in electrical engineering from the University of Southern California, respectively.

Dr. Ryan Lei is currently working as a video codec specialist and technical lead in the Video Infrastructure group at Meta. His focus is on algorithms and architecture for cloud based video processing, transcoding, and delivery at large scale for various Meta products. Ryan Lei is also the co-chair of the Alliance for Open Media (AOM) testing subgroup and is actively contributing to the standardization of AV1 and AV2. Before joining Meta, Ryan worked at Intel as a principal engineer and codec architect. He worked on algorithm implementation and architecture definition for multiple generations of hardware based video codecs, such as AVC, VP9, HEVC and AV1. Before joining Intel, Ryan worked at ATI handheld department, where he implemented embedded software for hardware encoder/decoder in mobile SoCs. Ryan received his Ph.D. in Computer Science from the University of Ottawa. His research interests include image/video processing, compression, adaptive streaming and parallel computing. He has (co-) authored over 50 publications, including 17 patents.

Dr. In Suk Chong is Video Codec Lead at Google; he also holds the co-chair of Hardware SubGroup at AOMedia. Prior to his tenure at Google, He worked at Qualcomm from 2008 to 2017 as the Video Codec Lead, spearheading advancements in video compression technology. In Suk holds a B.S. in Electrical Engineering from Seoul National University (1998) and earned his MS/Ph.D. in Electrical Engineering from the University of Southern California (USC) in 2004 and 2008, respectively.

Dr. Yu-Chen (Eric) Sun is a software engineer/tech lead at Meta. He works on improving video calling quality across the Meta Family of Apps. For over 10 years, he has been leading projects on developing large scale infrastructure for video products and actively participating in video compression standardization. He holds a PhD in video compression and has published over 16 journal and conference articles, contributed to more than 50 video standard technical proposals, and authored over 40 patents.

Hassene Tmar is Technical Program Manager at Meta; he also holds the role of Software Coordinator within the Software Implementation Working Group at AOMedia. Prior to his tenure at Meta, Hassene held the position of Engineering Manager at Intel, where he led the successful open-sourcing of SVT-HEVC, SVT-VP9, and SVT-AV1 encoder implementations. Hassene holds a Bachelor of Applied Sciences degree in Computer Engineering from the University of British Columbia. Throughout his past year at Meta, Hassene has been supporting the Media Algorithms Team within the Video Infrastructure Group with the deployment of AV1 and the launch of Meta’s Scalable Video Processor. Additionally, Hassene has been at the forefront of fostering academic collaborations between the Video Infrastructure team and various universities, with the aim of addressing the challenges of delivering video at scale.

Yilin Wang is a staff software engineer in the Media Algorithms team at YouTube/Google. He spent the last ten years on improving YouTube video processing and transcoding infrastructures, and building video quality metrics. Beside the video engineering work, he is also an active researcher in video quality related areas, and published papers in CVPR, ICCV, TIP, ICIP, etc. He received his PhD from the University of North Carolina at Chapel Hill in 2014, working on topics in computer vision and image processing.

Zuzanna Mroczek is a software engineer and tech lead in the Images Infrastructure team at Meta in London. Her focus is on full-stack architecture for measuring and improving perceptual image quality and overall quality of image experience at Meta’s scale. This includes work on understanding quality across common image distortions, evaluation of modern codecs and leading the rollout of first HDR photo experiences in Meta apps (enabled this year). Prior to joining Meta as a full-time employee in 2017 and starting work on image quality, she received her Master’s degree in Computer Science at Faculty of Mathematics, Informatics, and Mechanics of University of Warsaw. During her studies she also gained experience interning at Facebook (now Meta), Microsoft and Google across SWE and SRE teams.

Erik Andre is a software engineer and tech lead at Meta. He graduated with a Master of Science in Computer Engineering from Lund University, Sweden. After initially working on embedded systems and virtual machines at Sony Mobile, Erik moved on to Meta, working on image processing on mobile and backend systems, in their Images Infrastructure team based in London. This work involved evaluation and productionization of new codecs and image formats such as AVIF and JPEG XL.

Onur G. Guleryuz is a Software Engineer at Google working on machine learning and computer vision problems. Prior to Google he worked at LG Electronics, Futurewei, NTT DoCoMo, and Seiko-Epson all in Silicon Valley. Before coming to Silicon Valley in 2000 he served as an Asst. Prof. with NYU Tandon School of Engineering in New York.

He received the BS degrees in electrical engineering and physics from Bogazici University in 1991, the M.S. degree in engineering and applied science from Yale University in 1992, and the Ph.D. degree in electrical engineering from University of Illinois at Urbana-Champaign (UIUC) in 1997. He received the National Science Foundation Career Award, the IEEE Signal Processing Society Best Paper Award, the IEEE International Conference on Image Processing Best Paper Award, the Seiko-Epson Corporation President’s Award for Research and Development, and the DoCoMo Communications Laboratories President’s Award for Research. He is a Fellow of the IEEE.

He has served in numerous panels, conference committees, SPS technical committees, and media-related industry standardization bodies. He has authored an extensive number of refereed papers, granted US patents, and has leading edge contributions to products ranging from mobile phones to displays and printers.

Dr. Van Luong Pham is a video codec algorithms engineer at Apple. Prior to his tenure at Apple, He worked at Qualcomm from 2018 to 2022 as a senior video codec engineer, contributing to the development of video coding standards. Van Luong Pham holds a B.S. in Electrical Engineering from Hanoi University of Science and Technology in 2009 and a MS degree in Electrical Engineering from Sungkyunkwan University (Korea) in 2011.  He received his PhD from Ghent University (Belgium) in 2017, working on topics in video compression and transcoding.

Dr. Xin Zhao is a Video Codec Research Engineer at Apple, Inc. Prior to joining Apple, he was a Principal Researcher and Manager of Multimedia Standards at Tencent America LLC and he also worked as a Staff Engineer at Qualcomm Inc. Dr. Zhao holds a B.S. in Electronic Engineering from Tsinghua University and a Ph.D. in Computer Applications from the Institute of Computing Technology, Chinese Academy of Sciences. Dr. Zhao has over 17 years of experience in video codec standards development. He has contributed to several international video coding standards, including HEVC, 3D extensions to H.264/AVC and HEVC, and VVC. He has published around 60 papers in conferences and journals. Dr. Zhao is a senior member of IEEE and is currently involved in developing next-generation open video codecs with AOM.

Dr. Yeqing Wu is a machine learning and video processing engineer at Apple, where he focuses on video compression technology. Prior to joining Apple, he worked at Tencent and Dolby, specializing in video codec and HDR product development. He received his Ph.D., MS, and BS degrees from the University of Alabama, Shanghai Jiao Tong University, and Shanghai Tongji University, respectively.

Dr. Liang (Leo) Zhao is a video coding engineer and tech-lead at Tencent Media Lab. Before joining Tencent, he worked as a researcher at Hulu from 2016 to 2018. He holds a PhD degree in video compression and has published over 20 journals and conference papers. He was the primary author of more than 50 technical proposals to video standards and holds more than 100 patents. He serves as Software Coordinator within the Codec Working Group at AOMedia.

Madhu P Krishnan received the M.S degree in Electrical Engineering from the University of Texas at Arlington. He joined Tencent (Palo Alto, CA, USA) in 2019 as a Senior Research Engineer in Tencent Media Lab, where he has been actively contributing to the development of next-generation video coding standards. Before that he was a Research Engineer with FastVDO LLC, Melbourne, FL, USA. He has been actively contributing technical proposals for international video coding standards such as HEVC and next-generation video coding beyond AV1. His main research interests are in the area of image and video coding, computer vision and pattern recognition.

Presentations:
Wednesday, 30 October 2024

8:15 – 10:00: AOM/AVM Update

Talk-01
Title: Introduction to the joint workshop
Speaker: Dr. Ioannis Katsavounidis (Meta)
Abstract: Introduction and overview of the session topics.

Talk-02
Title: AVM Coding Tool Update
Speaker: Dr. Debargha Mukherjee (Google), Dr. Xin Zhao (Apple), Dr. Van Luong Pham (Apple), Dr. Yeqing Wu (Apple), Dr. Leo Zhao (Tencent), Madhu P Krishnan (Tencent)
Abstract: In this talk we will present a high-level overview of the video coding tools that are under consideration for inclusion in the next-gen AOM coding standard. The tools discussed are either included in AVM (the AOM Video Model) or are under discussion in the Codec Working Group.

Talk-03
Title: AVM Test Result Update
Speaker: Dr. Ryan Lei (Meta)
Abstract: After AV1 was released in 2018, AOM member companies have started the research and exploration work for the next generation of the coding standard after AV1. The reference encoder code base is called AVM, which was updated and released regularly. In this talk, we will present a high level summary of the latest compression efficiency result compared against AV1.

Talk-04
Title: AVM HW subgroup activity update
Speaker: Dr. In Suk Chong (Google)
Abstract: Google is co-chairing HW SubGroup activities which focuses on analyzing the issues in AVM decoder HW implementation on various architectures and addressing those problems. In this talk, we will present current efforts to make AVM HW friendly by analyzing all the major tools within AVM, and coming up with suggestions on how to address the issues if there are any.

Talk-05
Title: SIWG Working Group Update
Speaker: Hassene Tmar (Meta)
Abstract: The SIWG group has been diligently working on a product-level AV1 implementation (SVT-AV1) since the release of v1.0 in 2022. This presentation will showcase the progress made in optimizing compression and computational efficiency tradeoffs across various use cases, explain the improvements made to the fast-decode encoding mode and their impact on the software decoding speed on ARM-based processors, and share more details about the future plans for the group.

Talk-06
Title: VVM or Audio working group update
Speaker: Shan Liu (Tencent)
Abstract: In recent years, AOM has initiated codec standardization activities in two new areas aside from conventional 2D video – volumetric visual media and audio. Two new workgroups: Volumetric Visual Media (VVM) Workgroup and Audio Codec Workgroup have been established to explore next-generation standards in these areas respectively. This talk will provide an update on VVM and the compression performance that has been achieved for 3D polygonal meshes; as well as the Audio workgroup’s activities including v1.1 and v2.0 of IAMF.

 

10:30 – 12:00: Industry Talks

Talk-07
Title: Adopting Modern Image Codecs at Scale
Speaker: Ms. Zuzanna Mroczek (Meta), Mr. Erik Andre (Meta)
Abstract: Optimizing the image experience across Meta’s apps comes with challenges that are hard to find anywhere else. Every day, we handle billions of image uploads and trillions of image download requests on our CDN. This means that image compression is more important to us than ever in the world where visual quality matters but also where every millisecond of latency counts. In this talk we’re going to share findings from years of our work on adopting modern image codecs in our Image Infrastructure (AVIF, JPEG-XL and JPEGLI), pinpoint performance aspects of image compression that matter in different parts of our image pipelines and present challenges we encounter with image quality understanding between the formats.

Talk-08
Title: AV1 for RTC Across Meta Family of Apps
Speaker: Dr. Yu-Chen (Eric) Sun (Meta)
Abstract: In 2023, Meta announced the shipping of the AV1 codec on real-time communication across the Meta Family of Apps. Since then, we have been improving AV1 call quality and increasing AV1 call coverage. In this talk, Yu-Chen will share what he has learned from these endeavors.

Talk-09
Title: ML efforts in Google around practical codecs
Speaker: Dr. Debargha Mukherjee (Google), Dr. In Suk Chong (YouTube/Google), Dr. Balu Adsumili (YouTube/Google)
Abstract: Within AVM, there are several proposals based on ML to improve both encoder and decoder. In this talk, we will describe those efforts including CNN-based loop restoration filters and ML based encoder speed up. Furthermore, efforts toward ML based pre/post processing will be presented.

Talk-10
Title: Video Quality Dataset and Model in YouTube
Speaker: Dr. Yilin Wang (YouTube/Google), Dr. Balu Adsumili (YouTube/Google)
Abstract: Video quality research in YouTube contains two major areas: datasets and quality models. In 2019 we released our first large scale YouTube UGC dataset, which quickly became one of the most important quality datasets for academia and industry. With the rising popularity of Short Form Video (SFV) and High Dynamic Range (HDR) content, we recently released a new large scale quality dataset, called YouTube SFV+HDR dataset. The first part of this talk will highlight the key features of this new dataset and insights from subjective data analysis. In the second part, we will share recent updates about our quality model Universal Video Quality (UVQ).