StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2

Ivan Skorokhodov, Sergey Tulyakov, Mohamed Elhoseiny

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

100 Scopus citations

Abstract

Videos show continuous events, yet most - if not all - video synthesis frameworks treat them discretely in time. In this work, we think of videos of what they should be - time-continuous signals, and extend the paradigm of neural representations to build a continuous-time video generator. For this, we first design continuous motion representations through the lens of positional embeddings. Then, we explore the question of training on very sparse videos and demon-strate that a good generator can be learned by using as few as 2 frames per clip. After that, we rethink the traditional image + video discriminators pair and design a holistic dis-criminator that aggregates temporal information by simply concatenating frames' features. This decreases the training cost and provides richer learning signal to the generator, making it possible to train directly on 10242 videos for the first time. We build our model on top of StyleGAN2 and it is just ≈5% more expensive to train at the same resolution while achieving almost the same image quality. Moreover, our latent space features similar properties, enabling spa-tial manipulations that our method can propagate in time. We can generate arbitrarily long videos at arbitrary high frame rate, while prior work struggles to generate even 64 frames at a fixed rate. Our model is tested on four mod-ern 2562 and one 10242 -resolution video synthesis bench-marks. In terms of sheer metrics, it performs on average ≈30% better than the closest runner-up. Project website: https://universome.github.io/stylegan-v.

Original languageEnglish (US)
Title of host publicationProceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
PublisherIEEE Computer Society
Pages3616-3626
Number of pages11
ISBN (Electronic)9781665469463
DOIs
StatePublished - 2022
Event2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 - New Orleans, United States
Duration: Jun 19 2022Jun 24 2022

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume2022-June
ISSN (Print)1063-6919

Conference

Conference2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Country/TerritoryUnited States
CityNew Orleans
Period06/19/2206/24/22

Bibliographical note

Publisher Copyright:
© 2022 IEEE.

Keywords

  • Image and video synthesis and generation

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2'. Together they form a unique fingerprint.

Cite this