EEG2Video: Towards Decoding Dynamic Visual Perception from EEG Signals

NeurIPS 2024

Xuan-Hao Liu1* Yan-Kai Liu1* Yansen Wang2# Kan Ren3# Hanwen Shi1 Zilong Wang2
Dongsheng Li2 Bao-Liang Lu1 Wei-Long Zheng1#

1Shanghai Jiao Tong University 2Microsoft Research Asia 3ShanghaiTech University

Video Stimuli

origin

Decoding

origin

Introducing

In this paper, our contributions are:
  1. We build a new dataset SEED-DV, recording 20 subjects EEG data when viewing 1400 video clips of 40 concepts.
  2. We annotate each video clip, forming EEG-VP and video reconstruction benchmark.
  3. We propose EEG2Video, a framework for decoding videos from EEG using Seq2Seq and DANA modules with with an inflated Stable Diffusion model.

 

Motivation & Challenge

fMRI
EEG
Previous research reconstructed videos from fMRI data. However, fMRI data is low in temporal resolution (0.5 Hz), motivating us to turn to high temporal resolution neuroimaging techniques. EEG usually has a sampling rate of 1000 Hz, but there are still tons of challenges:

 

Video Stimuli Selection

stimuli
We eleborately select 40 concepts across 9 coarser classes to build our dataset: Land Animal, Water Animal, Plant, Exercise, Human, Nutural Scene, Food, Musical Instrument, Transportation.

 

Expeirment Protocol

protocol
We recorded 20 subjects' EEG data while they were viewing video stimuli.
We collected 35 video clips for each concept.

 

Meta Information Annotation: EEG-VP Benchmark

eeg_vp
We manually annotated some meta information to fully investigate the EEG's decoding capability.

 

EEG-VP Results

eeg_vpres
We evaluate a bunch of EEG models on the EEG-VP benchmark and conclude some findings:

 

EEG2Video Framework

eeg2video
In this paper, we propose EEG2Video, a pipeline for reconstructing videos from EEG signals. To deal with the high temporal resolution but low spatial resolution brain signals, we design several modules based on the results on the EEG-VP benchmark to better decode videos.

 

More Samples

Stimuli Decoding Stimuli Decoding

moresample moresample moresample moresample

moresample moresample moresample moresample

moresample moresample moresample moresample

moresample moresample moresample moresample

moresample moresample moresample moresample

moresample moresample moresample moresample

moresample moresample moresample moresample

moresample moresample moresample moresample

moresample moresample moresample moresample

moresample moresample moresample moresample

moresample moresample moresample moresample

moresample moresample moresample moresample

moresample moresample moresample moresample

moresample moresample moresample moresample

moresample moresample moresample moresample

moresample moresample moresample moresample

moresample moresample moresample moresample

moresample moresample moresample moresample




Fail Cases

Stimuli Decoding Stimuli Decoding

moresample moresample moresample moresample

moresample moresample moresample moresample

moresample moresample moresample moresample

moresample moresample moresample moresample


We present some failure samples, these failures are typically caused by the inability of the model to infer either the semantic information or the low-level visual information correctly, resulting the irrelevantly generated videos.



Acknowledgments

This website is crafted by Xuan-Hao Liu and Zheng Wang. Zheng Wang put in tremendous efforts for beautifying this website. And thanks all members in BCMI Lab for all the support and help. Sincere thanks to all subjects who participated in our experiment!

Huge thanks the Stable Diffusion Team for opensourcing their high-quality AIGC models. Gratitude to the Tune-A-Video Team for their elegant text-to-video model. And kudos to the Mind-Video Team for their pioneering and excellent fMRI-to-video work.