Research · May 2026

Deep Fusion of Temporal EEG and Spatial fMRI for Cross-Subject Sleep Stage Classification

Jahnavi Umesh presents new work on graph-based multimodal fusion at the Institute for Cognitive and Brain Health, Northeastern University, demonstrating that EEG and fMRI carry complementary information for sleep stage decoding.

Published May 1, 2026 · Neural Dynamics Group

Today at the Institute for Cognitive and Brain Health at Northeastern University, Jahnavi Umesh presents a poster on a multimodal deep learning framework that fuses simultaneous EEG and fMRI for cross-subject sleep stage classification. The work, joint with Ayan Paul, formalizes a long-standing intuition in neuroimaging: that the temporal precision of EEG and the spatial precision of fMRI can compensate for one another’s gaps when integrated through the right model class.

The Question

Sleep stage classification at the standard 30-second epoch resolution is a stress test for a single modality. EEG carries the sub-second oscillatory events that define each stage but generalizes poorly across subjects because of skull geometry, electrode placement, and individual rhythms. fMRI carries spatially resolved hemodynamic structure but is too slow to capture the events on which the staging criteria are built. The question is whether a joint model can extract complementary information from the two modalities and outperform either one alone on the open problem of cross-subject decoding.

Approach

The framework couples two graph convolutional encoders, one per modality, through a gated fusion layer. The fMRI encoder treats Schaefer-100 regions of interest as graph nodes with a fold-specific functional-connectivity adjacency, a 42-TR sliding context, and per-ROI 1D convolutional features. The EEG encoder uses per-channel 1D convolutions over Euclidean-aligned raw channels, a hybrid adjacency that combines fixed scalp geometry with a learnable matrix, and a bidirectional LSTM over three-epoch context to recover the temporal continuity assumed by the AASM scoring rules. Fusion is performed by a per-element sigmoid gate over concatenated encoder embeddings, trained with modality dropout (p = 0.5 on EEG) so that the fMRI branch is forced to maintain a standalone discriminative signal when the gate down-weights EEG.

Findings

Evaluated through 23-fold leave-one-subject-out cross-validation on the publicly available simultaneous EEG-fMRI sleep corpus ds003768 (Gu et al., 2023), each modality alone substantially exceeds machine-learning baselines on engineered features: the fMRI graph convolutional network reaches 0.4851 balanced accuracy and the EEG graph convolutional network reaches 0.5883, gains of 11.7 and 12.8 points respectively over the strongest classical baseline. Gated fusion reaches 0.5830 cross-subject and 0.7200 within-subject, the latter exceeding the within-subject EEG-only ceiling of 0.7120. The cross-modal interaction is asymmetric and informative: fusion lifts fMRI from 0.485 to 0.589, while EEG, which already dominates a temporally defined task, gains less. Centered kernel alignment between modalities sits near 0.05, indicating that the two encoders learn nearly orthogonal representations and that the gains from fusion arise from genuine complementarity rather than redundancy.

Why It Matters

The result establishes a first deep EEG-fMRI fusion benchmark on a publicly available simultaneous-acquisition corpus and quantifies the regime in which multimodality helps. For sleep, which is temporally defined, EEG remains the dominant signal and fMRI is recovered through fusion. More broadly, the work motivates a research program for graph-based multimodal models of neural data that extends well beyond sleep.

Future Directions

Three lines of follow-up work are planned. First, the graph-based fusion architecture is a natural starting point for neural foundation models that exploit the complementary spatiotemporal structure of EEG and fMRI at scale, with the goal of learning shared representations transferable across cognitive tasks and recording protocols. Second, an open methodological question is whether cross-modal compensation requires simultaneous acquisition, or whether task-matched separate EEG and fMRI recordings produce equivalent fusion benefits at substantially lower acquisition cost. Third, sleep staging is a temporally defined task, which favours EEG; the symmetric prediction is that on a spatially defined task such as visual decoding or spatial attention, fMRI should dominate and lift EEG through the same gated-fusion mechanism. Confirming this asymmetry would establish task structure, rather than modality identity, as the primary determinant of which signal carries the discriminative load.

Acknowledgments

The work uses ds003768 from OpenNeuro (Gu, Sainburg, Han, & Liu, 2023). We thank the Institute for Cognitive and Brain Health and the Institute for Experiential AI at Northeastern University for support, and Ayan Paul as project advisor and PI of the Neural Dynamics Group.