Research | Neural Dynamics Lab

Genomics & RNA Biology

Developing machine learning approaches for mRNA biology, long-read RNA sequencing, and single-cell omics to decode gene regulation in complex biological systems.

Topics

mRNA Splicing & ASO Targets
The lab develops dilated convolutional and attention-based deep learning models trained on tissue-specific RNA-seq data coupled with genome-wide genotyping to predict splice site identification and usage quantification from pre-mRNA sequences, with a focus on human airway epithelial cells and lung tissues. These models integrate naturally occurring genetic variants to achieve improved tissue-specific and cross-tissue splicing prediction relative to existing methods such as SpliceAI, Pangolin, and SpliceTransformer. Interpretability methods, including DeepLIFT/SHAP attribution analysis, are applied to map splicing quantification changes to cis-regulatory elements and RNA-binding protein motifs, facilitating the identification of mRNA targets for anti-sense oligonucleotide (ASO) therapeutics in diseases such as Chronic Obstructive Pulmonary Disease (COPD).
Long-read RNA Sequencing Analysis
The lab has built a long-read RNA sequencing analysis pipeline in collaboration with researchers at Harvard and Brigham and Women's Hospital, enabling full-length transcript isoform characterization for alternative splicing studies across diverse tissues and cell lines. This pipeline complements the group's short-read RNA-seq analyses to capture transcript-level complexity that standard sequencing approaches miss.
Single-cell Omics
The group conducts single-cell transcriptomic analyses (data processing, dimensionality reduction, and trajectory inference) in collaboration with researchers at Harvard, Brigham and Women's Hospital, and Northeastern's Network Science Institute. This single-cell work connects to the lab's immunotherapy research, where team members apply single-cell function-to-omics methods analyzing RNA-seq data from antibody-activated T cells using tools such as DESeq2, Salmon, and STAR aligner.
RNA Regulation & Editing
The lab studies the regulation of alternative splicing by RNA-binding proteins (RBPs) using eCLIP binding data from ENCODE (K562 and HepG2 cell lines) and machine learning models trained on RBP binding graphs constructed from exon triplets, employing Shapley value-based interpretability to quantify individual and cooperative RBP contributions and to identify binding-region-specific splicing enhancer and suppressor functions. In parallel, the lab has developed MERLIN, a transformer-based foundation model with on the order of 10 million parameters, whose embeddings are applied to downstream prediction of ADAR editing sites, N6-methyladenosine (m6A) modification, and binding sites for over 245 RBPs, matching or exceeding the performance of substantially larger foundation models. This work contributes to a broader effort to build a vocabulary of functionally relevant RNA elements, including canonical and non-canonical splicing motifs, RNA structural features, and RBP binding sites, to characterize the cis- and trans-regulatory landscape governing tissue-specific post-transcriptional regulation.

View related publications →

Protein Dynamics, Function & Evolution

Applying generative models and computational methods to understand protein function, evolution, and dynamics, and to design novel proteins with targeted properties.

Topics

Generative Models for Protein Design
The lab develops GenZymes, a diffusion-based generative framework for de novo enzyme design that generates protein backbone structures conditioned on target catalytic functions, with applications spanning enzyme active site scaffolding, targeted protein binder design, symmetric oligomer construction, and de novo interface engineering. In parallel, the group is building an inverse folding model based on S4 state space architectures as an alternative to ProteinMPNN for sequence prediction from backbone geometry. Structural accuracy of generated designs is validated against AlphaFold2 predictions using backbone RMSD thresholds, with ongoing efforts directed toward flow matching and classifier-guided techniques for catalytic enzyme generation.
Protein Function Prediction & Optimization
The group collaborates closely with researchers at Northeastern's Khoury College, using protein language models, Gene Ontology annotation, and Hidden markov Models trained on evolutionary data for enzyme function prediction and optimization. In collaboration with industry partners, the lab develops AI models that integrate proprietary experimental data with public sequence and structural databases to predict enzymes with enhanced catalytic activity, addressing the challenge of sparse functional annotations for novel enzymes not well represented in existing training sets. These models are iteratively refined through a wet-lab-in-the-loop protocol in which computational predictions of enzyme function are experimentally validated and fed back to retune the AI models, enabling high-throughput, scalable enzyme function annotation.
Protein Dynamics
The lab has developed HyperFlow, a generative model for protein conformational ensembles that combines a hierarchical hypergraph variational autoencoder with flow matching, encoding molecular structure across atom, residue, and domain levels to capture multi-resolution dependencies that standard graph-based approaches miss. By operating on hyperedges rather than pairwise edges, the model represents many-body interactions such as coordinated sidechain rotamer states and collective secondary structure motions that are crucial for physically realistic conformational sampling. The framework learns a structured latent space from MD simulations and generates physically and chemically valid conformations via learned velocity fields, enabling exploration of protein energy and functional landscapes without the cost of full simulations.

View related publications →

The proteins-related work mentioned in this section is co-developed with Pedja's Lab at Northeastern University.

Applied AI for Biomedicine

Leveraging ML for personalized immunotherapy, cancer subtyping, drug synergy prediction, healthcare outcomes, EEG/VEP foundation models, and epidemiological modeling.

Topics

Personalized Immunotherapy
The lab collaborates with immunology researchers at Northeastern and Feromics Inc. to develop AI platforms for precision cancer immunotherapy, addressing the variability in treatment outcomes that arises from the challenge of genotype-to-phenotype mapping in immune cell function. Using single-cell microfluidic assays that isolate individual immune cells with tumor cells, the group builds a function-to-omics pipeline that maps immune cell killing capability to transcriptomic and genomic profiles through AI models of gene expression and mRNA splicing fine-tuned on patient-specific RNA-seq data. The AI platform integrates multi-omics models with natural language processing to combine individual patient datasets with historical immunotherapy research, enabling high-throughput analysis for personalized immunotherapy strategies.
Cancer Subtyping & Biomarker Discovery
The group investigates dysregulated alternative splicing as a molecular mechanism underlying SCLC pathogenesis and its link to COPD, with preliminary analyses identifying genes such as AKAP9, PABPC4, and SRSF5 whose aberrant splicing patterns are shared between the two smoking-induced pathologies. The lab applies AI/ML models and single-cell omics to characterize genomic and transcriptomic variation across SCLC subtypes, tumor heterogeneity, and to study phenotypic plasticity driven by neuroendocrine to non-neuroendocrine transitions. Computational frameworks integrating foundation models, gene regulatory network inference, and trajectory analysis are used to map subtype transitions associated with chemoresistance, with the goal of identifying splicing-based biomarkers for early detection and potential therapeutic targets including splice-switching antisense oligonucleotides.
Drug Synergy Prediction
The lab applies the MuSyC (Multi-dimensional Synergy of Combinations) framework with statistical analysis and probabilistic modeling to predict drug synergy across diverse cancer cell types, decoupling synergistic potency, efficacy, and cooperativity parameters through mass-action kinetics to characterize combination dose-response behavior. In collaboration with Duet Biosystems, the group is extending this foundation to a machine learning framework that predicts organ-specific combination toxicity by integrating in-vitro dose-response data with single-drug perturbation profiles using deep learning models for dose-response surface prediction. This work incorporates distributional predictions and conformal prediction for uncertainty quantification, with the goal of enabling scalable computational screening of drug combination safety profiles to accelerate the prioritization of tolerable and efficacious combination regimens for oncology clinical trials.
EEG Foundation Models
The lab is developing foundation models for electroencephalography (EEG) signals, with an emphasis on learning representations of neural oscillatory patterns across frequency bands, including mu (8-13 Hz) and beta (13-30 Hz) rhythms associated with motor imagery event-related desynchronization (ERD) and event-related synchronization (ERS) at sensorimotor cortical regions. Deep learning approaches are applied to multi-channel EEG data to bypass traditional preprocessing requirements such as manual artifact removal via Independent Component Analysis (ICA) and hand-crafted feature engineering, instead learning robust representations directly from raw or minimally filtered signals that are invariant to common artifacts including ocular, muscular, and cardiac interference. These models aim to capture the spectral, temporal, and spatial structure of cortical activity for downstream tasks in brain-computer interface decoding and clinical neurophysiology.
Epidemiological Modeling
The lab has published extensively on epidemic modeling: mathematical analysis of automated contact tracing showing 60-80% population participation thresholds (J. R. Soc. Interface, 2021), interpretable ML with Boosted Decision Trees and Shapley values revealing socioeconomic disparities in COVID-19 spread using US Census data (J. Phys. Complex., 2021), and causal inference using causal Shapley values combining cooperative game theory with do-calculus (arXiv:2201.07026). Current work with Northeastern's Network Science Institute builds reinforcement learning models for epidemic outbreaks and agent-based models simulating entire populations (e.g., 330 million agents for the US), incorporating comorbidity and mental health data.

View related publications →

Core Machine Learning & Computer Vision

Advancing foundational ML through flow matching models, sensor data fusion, and image super-resolution and denoising techniques.

Topics

Flow Matching & Generative Models
The lab's DeepWeightFlow (arXiv:2601.05052, ICLR 2026) introduces a flow matching model operating directly in neural network weight space, using Git Re-Basin and TransFusion for canonicalization to resolve permutation symmetries, generating complete high-accuracy weights for MLP, ResNet, ViT, and BERT architectures without fine-tuning. Ensembles of hundreds of networks can be generated in minutes, substantially outperforming diffusion-based approaches in speed while excelling at transfer learning. The group also explores equivariant generative models for neural networks in collaboration with Northeastern faculty.
Sensor Data Fusion
The lab published "Exploring the design space of diffusion and flow models for data fusion" (arXiv:2510.21791, October 2025), systematically benchmarking UNet, conditional diffusion models (DDPM/EDM), and flow matching architectures for fusing DMSP-OLS and VIIRS nighttime lights satellite data over Texas. The study identifies UNet-based diffusion models as best for preserving fine-grained spatial detail, provides guidance on noise scheduler selection (iterative solvers for speed vs. discrete schedulers for quality), and explores FP16 quantization for memory-efficient deployment, evaluated using SSIM, PSNR, MAE, and a novel power spectral density analysis.

View related publications →

Knowledge Aggregation & Hypothesis Generation

Building LLM-powered retrieval-augmented generation systems to synthesize scientific literature, aggregate domain knowledge, and accelerate hypothesis generation across research disciplines.

Topics

Retrieval-Augmented Generation (RAG)
The lab has developed PRISM (Precision Research and Information Systems for bioMedicine), a hybrid retrieval-augmented generation platform that combines GraphRAG and VectorRAG with swarms of AI agents for autonomous knowledge aggregation and retrieval from biomedical literature. PRISM is integrated with INDRA (Integrated Network and Dynamical Reasoning Assembler) through biomedical ontologies, providing access to over 37 million publications, 455,000 clinical trials, 45,000 patents, and 12 million unique biological mechanisms, with graph-of-thought reasoning. The knowledge aggregation infrastructure employs distributed AI agent swarms for entity extraction, enrichment, and graph construction, processing over one million documents with 200-fold faster document processing throughput relative to standard graph database implementations.
Scientific Literature Synthesis
The lab uses PRISM to extract structured knowledge from scientific literature, with INDRA providing ontology-based grounding of extracted entities. This literature-derived knowledge base is augmented with experimental data through DURGA (Distributed Unified Reasoning over Graph-structured biomedical Agents), an agentic system that orchestrates bioinformatics pipelines via a natural language interface with AI-powered pipeline recommendation, parameter optimization, and automated quality control. The integration of PRISM's literature knowledge extraction with INDRA-grounded reasoning and DURGA's data processing enables cross-study insights through pathway enrichment, Gene Ontology annotation, clinical relevance scoring, and biomarker discovery grounded in both published evidence and proprietary assay results.
Cross-Domain Knowledge Aggregation
The lab uses eGoT (enhanced Graph-of-Thoughts), a novel retrieval algorithm that extracts knowledge and performs iterative multi-hop reasoning across traditionally siloed domains by combining adaptive graph traversal with LLM-driven thought generation, evaluation, and evidence synthesis. eGoT has been applied to cross-domain hypothesis generation integrating lupus pathophysiology with climate science literature, where it traced mechanistic chains spanning up to 6 reasoning hops (UV exposure to keratinocyte apoptosis to autoantigen exposure to immune complex formation to systemic inflammation) across a knowledge graph constructed from immunology, photobiology, and atmospheric science publications. The system incorporates knowledge boundary recognition that correctly avoids hallucinated connections when queried with out-of-domain content.
Automated Hypothesis Generation
The lab is developing BRAHMA (Biomedical Reasoning and Automated Hypothesis Generation using Multi-Agent GraphRAG), a multi-agent system with dual validation for computational hypothesis discovery that employs a hypothesis agent to generate candidate hypotheses from local knowledge graphs and evaluates novelty and plausibility against both local domain knowledge and global biomedical knowledge via INDRA. The dual validation layer performs co-occurrence analysis, claim verification, pattern matching, and graph proximity checks against the local Neo4j knowledge graph, while global validation conducts entity grounding and literature cross-referencing.

View related publications →

INDRA (Integrated Network and Dynamical Reasoning Assembler) is developed by the Gyori Lab, supervised by Benjamin Gyori at Northeastern University. The Knowledge aggregation systems mentioned in this section is co-developed with the Gyori lab.

Research Directions

Genomics & RNA Biology

Topics

Protein Dynamics, Function & Evolution

Topics

Applied AI for Biomedicine

Topics

Core Machine Learning & Computer Vision

Topics

Knowledge Aggregation & Hypothesis Generation

Topics

Want to collaborate?