Make-up Dense Video Captioning (MDVC) Challenge

Photo of author
Written By Zach Johnson

AI and tech enthusiast with a background in machine learning.

Given an untrimmed make-up video, the Make-up Dense Video Captioning (MDVC) task aims to localize and describe a sequence of makeup steps in the target video. This task challenges models to both detect and describe fine-grained make-up events within a video.

Inputs: An untrimmed make-up video ranging from 15s to 1h.
Outputs: The temporal boundary and the generated description of detected make-up events in the video.

Prizes

  • 1st prize: ¥ 10,000
  • 2nd prize: ¥ 3,000
  • 3rd prize: ¥ 2,000

Schedule (Beijing, UTC+8)

  • April 25th, 2022 – Training / validation set released
  • June 10th, 2022 – Testing set released and submission opened
  • June 25th, 2022 – Submission deadline
  • June 26th-30th, 2022 – Objective evaluation
  • July 1st, 2022 – Evaluation results announce
  • July 6th, 2022 – Paper submission deadline

Submission Details

Results should be stored in results.json, following this format:

{
  'video_id': [
    {
      'sentence': sent,
      'timestamp': [st_time, ed_time],
    }, ...
  ],
}

Teams can submit the results.json once daily. The evaluation process might be time-consuming, but a failed submission won’t affect the number of submission chances.

Evaluation Metrics

The challenge assesses both the localization and captioning abilities of models. For localization, we compute the average precision (AP) across tIoU thresholds of {0.3,0.5,0.7,0.9}. For dense captioning, we measure BLEU4, METEOR, and CIDEr for matched pairs between generated captions and the ground truth across tIoU thresholds of {0.3, 0.5, 0.7, 0.9}.

About the Dataset

Makeup instructional videos inherently possess a finer granularity than open-domain videos. While many steps might share similar backgrounds, they showcase subtle yet essential differences. This includes particular actions, tools, and facial areas applied, which can produce distinct facial effects.

The YouMakeup dataset, sourced from YouTube, encompasses 2,800 makeup instructional videos, amounting to over 420 hours. Every video is annotated with a series of steps, with details like temporal boundaries, highlighted facial areas, and step-by-step descriptions. In total, there are 30,626 steps, with each video having an average of 10.9 steps. Videos typically range from 15s to 1h, averaging 9 minutes in length.

YouMakeup Dataset Overview

Dataset Total Train Val Test Video_len
YouMakeup 2,800 1,680 280 840 15s-1h

AI is evolving. Don't get left behind.

AI insights delivered straight to your inbox.

Please enable JavaScript in your browser to complete this form.