Termeh Taheri

Research

My work centers on audio and multimodal machine learning, including audio reasoning, representation learning, generative music systems, and evaluation frameworks for next-generation LLMs. I am broadly interested in how AI can connect sound, music, and language to build interpretable, reliable, and creatively useful systems.

Current Research Directions

Symbolic audio reasoning with LLMs — structured, human-readable representations that reduce hallucinations and improve interpretability in audio QA.
Generative music & text-to-music evaluation — reproducible frameworks for model quality, preference studies, and large-scale benchmarking.
Audio–language–vision benchmarks — multimodal datasets designed to stress-test perception, reasoning, and temporal understanding.
Practical ML systems — scalable experimentation pipelines and deployment-ready research tooling (PyTorch, FastAPI, Docker).

📄 Selected Publications

SAR-LM: Symbolic Audio Reasoning with Large Language Models

Taheri, T., Ma, Y., & Benetos, E. (2025). Accepted at LLM4MA Workshop @ ISMIR 2025; under review at ICASSP 2026.

Paper • Code

Summary:
SAR-LM introduces a modular pipeline that converts audio into symbolic, interpretable features (speech transcripts, sound events, notes, chords) before reasoning with LLMs. It enables transparent error analysis, stronger reasoning stability, and competitive performance on MMAU, MMAR, and OmniBench.

OmniVideoBench: Towards Audio–Visual Understanding Evaluation for Omni MLLMs

Li, C., Chen, Y., Ji, Y., Xu, J., Cui, Z., … Taheri, T., et al. (2025).Under review at ICLR 2026

Paper

Summary:
OmniVideoBench is a large-scale benchmark evaluating synergistic audio–visual reasoning. It contains 1,000 manually verified QA pairs with reasoning traces across 628 videos, covering 13 question types including temporal reasoning, spatial localization, causal inference, counting, and summarization.

Breast cancer prediction by ensemble meta-feature space generator based on deep neural network

Taheri, M., & Omranpour, H. (2024). Biomedical Signal Processing and Control.

Paper

Summary:
A meta-feature generator for breast ultrasound classification that learns an enhanced feature space without relying on data augmentation. EMFSG-Net reduces overfitting, handles imbalanced datasets, and outperforms prior deep learning approaches on the BUSI benchmark (97.96% accuracy, 96.2% F1).

Presentation of encryption method for RGB images based on an evolutionary algorithm using chaos functions

Omranpour, H., Mohammadi Ledari, Z., & Taheri, M. (2023).
Multimedia Tools and Applications.

Paper

Summary:
A chaos-enhanced encryption scheme using evolutionary optimization and hash-initialized logistic mapping. Delivers high entropy, fast computation, and strong robustness against differential and plaintext attacks.