Abstract
Video content creation keeps growing at an incredible pace; yet, creating engaging stories remains challenging and requires non-trivial video editing expertise. Many video editing components are astonishingly hard to automate primarily due to the lack of raw video materials. This paper focuses on a new task for computational video editing, namely the task of raking cut plausibility. Our key idea is to leverage content that has already been edited to learn fine-grained audiovisual patterns that trigger cuts. To do this, we first collected a data source of more than 10K videos, from which we extract more than 255K cuts. We devise a model that learns to discriminate between real and artificial cuts via contrastive learning. We set up a new task and a set of baselines to benchmark video cut generation. We observe that our proposed model outperforms the baselines by large margins. To demonstrate our model in real-world applications, we conduct human studies in a collection of unedited videos. The results show that our model does a better job at cutting than random and alternative baselines.
Original language | English (US) |
---|---|
Title of host publication | Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 6838-6848 |
Number of pages | 11 |
ISBN (Electronic) | 9781665428125 |
DOIs | |
State | Published - 2021 |
Event | 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021 - Virtual, Online, Canada Duration: Oct 11 2021 → Oct 17 2021 |
Publication series
Name | Proceedings of the IEEE International Conference on Computer Vision |
---|---|
ISSN (Print) | 1550-5499 |
Conference
Conference | 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021 |
---|---|
Country/Territory | Canada |
City | Virtual, Online |
Period | 10/11/21 → 10/17/21 |
Bibliographical note
Funding Information:We introduced the task of cut plausibility ranking for computational video editing. We proposed a proxy task that aligns with the actual video editing process by leveraging knowledge from already edited scenes. Additionally, we collected more than 260K edited video clips. Using this edited footage, we created the first method capable of ranking cuts automatically, which learns in a data-driven fashion. We benchmarked our method with a set of proposed metrics that reflect the model’s level of precision at retrieval and expertise at providing tighter cuts. Finally, we used our method in a real-case scenario, where our model ranked cuts from non-edited videos. We conducted a user study in which editors picked our model’s cuts more often compared to those made by the baselines. Yet, there is still a long way to match editors’ expertise in selecting the most smooth cuts. This work aims at opening the door for data-driven computational video editing to the research community. Future directions include the use of fine-grained features to learn more subtle patterns that approximate better the fine-grained process of cutting video. Additionally, other modalities such as speech and language could bring benefits for ranking video cuts. Acknowledgments This work was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research through the Visual Computing Center (VCC) funding.
Publisher Copyright:
© 2021 IEEE
ASJC Scopus subject areas
- Software
- Computer Vision and Pattern Recognition