UChicago Computer Scientists Bring in Generative Neural Networks to Stop Real-Time Video From Lagging

In a world where remote work and virtual communication is moving to the forefront of our lives, there are few things more frustrating than when we begin to lose connection. Video quality disintegrates, audio becomes unrecognizable, and sometimes we are left with no choice but to drop out of a meeting and retry again. It’s a problem that has evaded a quality solution for decades, but a team from the Department of Computer Science at the University of Chicago thinks they have found a promising solution.

How Real-Time Video Works

The breakdown occurs, especially in long-distance video, because part of the data is taking too long to move through the Internet. When data needs to be retransmitted through the Internet, it essentially blocks other received data from being decoded and creates the visual stuttering of content we have all grown to dislike.

In order to work correctly, live streaming video such as online meetings and cloud gaming turn packets of data into multiple frames that function like a flip book. Frames need to transmit in consecutive order without any of the picture going missing so that the receiver can make sense of what the sender is trying to convey.

The Current Problem

Unfortunately when network quality becomes poor, any frames can be affected by data loss. Real-time video clients have had to gamble on which frames need to be protected by redundant coding so that users don’t experience the ever-so-common pixelation of content. As with any gamble, results may vary.

“The key challenge in video streaming is how to tolerate data packet loss on a network,” said Assistant Professor of Computer Science Junchen Jiang. “If you are doing real time video streaming, the content cannot be pre-encoded like it is for Netflix or a YouTube video. There is no buffer on your side to conceal any jitter or data retransmission in the network. So there is a fundamental trade-off here: you can either have good quality, or you can tolerate delay or loss in the network.”

A Viable Solution

The idea arose two years ago when second-year Ph.D. student Yihua Cheng and then Master student Anton Arapin were studying computer vision literature and came across a tool, called autoencoder, that used neural networks to encode and recreate pictures. Because it is generative, when the encoded data is corrupted, it can still recreate realistic looking pictures. The idea was sparked that this tool could potentially be repurposed to enable real-time streaming without being bothered by network loss.

Over the past two years, Cheng and the team of UChicago scientists have been pulling inspiration from this tool to develop Grace++; a patent-pending, real-time video communication system that uses an autoencoder to encode and decode frames using specially trained neural networks and a custom frame delivery protocol. When a subset of data goes missing in translation, Grace++ fills in the missing piece by essentially masking the hole with generic data so the entire storyline doesn’t get backed up and fall apart.

This figure shows an example test result: Grace++ can decode video consistently at a higher quality than prior solutions under a range of data loss rates.

“Let’s say when I send my video to you in real time and some data is lost in the transmission, it’s going to chop some data,” explained Jiang, who oversaw the research. “That missing data could then be generated by the neural network on your site instead.”

Limitations

One of the biggest limitations with this type of system is that it requires an expensive graphic processing unit (GPU) specially designed to run a neural network. Although it is becoming more common, GPUs are not usually available on a cellphone or resource-constrained devices. The team is trying to mitigate this limitation by shrinking the size of the neural network so it can run on a wider range of devices. The system would also need an additional player to be installed on the computer before it could work, which requires more than a couple lines of code. The team is trying to integrate it into a browser, which would essentially allow someone to install it as a plugin.

Grace++ is in the process of being patented through the University of Chicago’s Polsky Center for Entrepreneurship and Innovation. Two archived versions of the research are available, and a peer-reviewed version is currently going through submission. The team is also partnering with NVIDIA, a world leader in artificial intelligence computing, to create a follow-up version of the system.

Resources

Community

University of Chicago PhD Graduates Secure Tenure-Track Faculty Positions Amid a Competitive Job Market

Democratizing Digital Graphics: An Undergrad’s Unlikely Path To Putting Agency of 3D-Generation in Users’ Hands

Faculty Spotlight: Get to Know Kexin Pei

The Future of AI Panel: Alumni Weekend

Can we authenticate human creativity?

AI and the Future of Work Panel: Featuring Nick Feamster

How Real-Time Video Works

The Current Problem

A Viable Solution

Limitations

“Machine Learning Foundations Accelerate Innovation and Promote Trustworthiness” by Rebecca Willett

Nightshade: Data Poisoning to Fight Generative AI with Ben Zhao

Five UChicago CS students named to Siebel Scholars Class of 2024

In The News: U.N. Officials Urge Regulation of Artificial Intelligence

UChicago Team Wins The NIH Long COVID Computational Challenge

UChicago Assistant Professor Raul Castro Fernandez Receives 2023 ACM SIGMOD Test-of-Time Award

Computer Science Displays Catch Attention at MSI’s Annual Robot Block Party

UChicago, Stanford Researchers Explore How Robots and Computers Can Help Strangers Have Meaningful In-Person Conversations

Postdoc Alum John Paparrizos Named ICDE Rising Star

New EAGER Grant to Asst. Prof. Eric Jonas Will Explore ML for Quantum Spectrometry

Assistant Professor Chenhao Tan Receives Sloan Research Fellowship

UChicago Scientists Develop New Tool to Protect Artists from AI Mimicry