TVQA Dataset


A Localized, Compositional Video Question Answering Dataset

Updates

  • 2019-01 We have released two new datasets, TVR and TVC, for large-scale video-subtitle moment retrieval and captioning.
  • 2019-12 TVQA+ code and codalab evaluation server is online.
  • 2019-04 TVQA+ dataset v1.0 is released, a spatio-temporally localized video question answering dataset. Read our paper.
  • 2018-11 TVQA evaluation portals are open on Codalab, w/o ts and w/ ts.
  • 2018-11 We released high quality frames for TVQA dataset
  • 2018-11 PyTorch code for TVQA dataset paper is now available!
  • 2018-09 TVQA website goes online, TVQA dataset v1.0 is released.
  • More
  • There is no more news at this time

What are TVQA and TVQA+?


TVQA is a large-scale video QA dataset based on 6 popular TV shows (Friends, The Big Bang Theory, How I Met Your Mother, House M.D., Grey's Anatomy, Castle). It consists of 152.5K QA pairs from 21.8K video clips, spanning over 460 hours of video. The questions are designed to be compositional, requiring systems to jointly localize relevant moments within a clip, comprehend subtitles-based dialogue, and recognize relevant visual concepts.

TVQA+ is a subset of TVQA dataset, additionally augmented with 310.8k bounding boxes, linking depicted objects to visual concepts in questions and answers.

Large-Scale

Consists of 152.5K QA pairs from 21.8K clips, TVQA is one of the largest of its kind.

Compositional

Questions are designed to be compositional, requiring both visual and textual cues.

Localized

QA pairs are Spatio-Temporally localized with bounding box and timestamp annotation.

It's Fun!

TVQA videos are made from popular TV shows, the ones you'd love!

Papers


TVQA+: Spatio-Temporal Grounding for Video Question Answering

Jie Lei, Licheng Yu, Tamara L. Berg, Mohit Bansal.

Tech Report, arXiv, 2019 [PDF] [Code] [BIB]

TVQA: Localized, Compositional Video Question Answering

Jie Lei, Licheng Yu, Mohit Bansal, Tamara L. Berg.

Empirical Methods in Natural Language Processing (EMNLP) 2018 [PDF] [Code] [BIB]

Team


Jie Lei

University of North Carolina at Chapel Hill

Licheng Yu

University of North Carolina at Chapel Hill

Mohit Bansal

University of North Carolina at Chapel Hill

Tamara L. Berg

University of North Carolina at Chapel Hill