Portrait of Shree Harsha.

PhD student · Speech technology · Conversational AI

Shree Harsha

I research speech interaction dynamics, SpeechLLM evaluation, and responsible conversational AI.

My current work studies enchronic speech dynamics with self-supervised learning, voice-based bias evaluation, and failure modes in speech foundation models. I am supervised by Éva Székely and Gustav Eje Henter, supported by the WASP program, and based at KTH Royal Institute of Technology.

About

Research Focus

Enchronic speech dynamics are the moment-by-moment patterns that make conversation work: when people take turns, how quickly they respond, how meaning is built together, and how local social context shapes what comes next.

My research uses self-supervised learning to study these dynamics and to ask how SpeechLLMs and conversational AI systems can better reflect the timing, variation, and responsibility that real interaction requires.

Research

Current Directions

Enchronic Speech Dynamics

Problem. Conversation unfolds in real time, but many speech systems treat utterances as isolated clips.

Why it matters. Timing, turn-taking, and shared meaning are central to natural interaction and to conversational AI that behaves responsibly.

Methods. Speech timing analysis, interaction-focused modeling, and self-supervised representations of spoken audio.

In-Situ Bias Evaluation

Problem. Multiple-choice bias benchmarks can miss how SpeechLLMs behave in open-ended, real-use settings.

Why it matters. Context-aware measurements are needed when voice, accent, gender, and interaction style shape model behavior.

Methods. Controlled acoustic variation, voice conversion, long-form evaluation, LLM-as-a-judge methods, and human preference validation.

SpeechLLM Robustness and Failure Modes

Problem. Systems can appear strong on proxy tasks while failing under transformations, restoration, conversion, or out-of-distribution speech.

Why it matters. Robust speech AI needs evaluations that reveal what models encode, what users perceive, and what downstream classifiers actually measure.

Methods. Speech continuation probes, voice-conversion studies, deepfake-detection stress tests, and representation analysis.

Publications

Publications and Preprints

  1. 2026

    The Voice Behind the Words: Quantifying Intersectional Bias in SpeechLLMs

    SH Bokkahalli Satish, C Minixhofer, M Teleki, J Caverlee, O Klejch, P Bell, GE Henter, É Székely.

    Interspeech 2026 · Accepted · arXiv:2603.16941

  2. 2026

    From Seeing it to Experiencing it: Interactive Evaluation of Intersectional Voice Bias in Human-AI Speech Interaction

    SH Bokkahalli Satish, M Teleki, C Minixhofer, O Klejch, P Bell, É Székely.

    CHI Extended Abstracts 2026

  3. 2026

    “Walk a Mile in My Voice”: Voice Conversion Shapes Trust, Attribution, and Empathy in Human-AI Speech Interactions

    SH Bokkahalli Satish, M Teleki, C Minixhofer, O Klejch, P Bell, É Székely.

    IUI Companion 2026

  4. 2026

    What Counts as Real? Speech Restoration and Voice Quality Conversion Pose New Challenges to Deepfake Detection

    SH Bokkahalli Satish, H Lameris, J Gustafson, É Székely.

    arXiv:2603.14033 · Submitted to Interspeech 2026

  5. 2026

    Do Bias Benchmarks Generalise? Evidence from Voice-based Evaluation of Gender Bias in SpeechLLMs

    SH Bokkahalli Satish, GE Henter, É Székely.

    IEEE ICASSP 2026 · Accepted

  6. 2026

    Speak Your Mind: The Speech Continuation Task as a Probe of Voice-Based Model Bias

    SH Bokkahalli Satish, H Lameris, O Perrotin, GE Henter, É Székely.

    Identity-Aware AI LREC Workshop 2026 · Accepted · arXiv:2509.22061

  7. 2025

    Lost in Phonation: Voice Quality Variation as an Evaluation Dimension for Speech Foundation Models

    H Lameris, SH Bokkahalli Satish, J Gustafson, É Székely.

    arXiv:2510.25577 · Submitted to LREC 2026

  8. 2025

    When Voice Matters: Evidence of Gender Disparity in Positional Bias of SpeechLLMs

    SH Bokkahalli Satish, GE Henter, É Székely.

    SPECOM 2025 · Published

  9. 2025

    Hear Me Out: Interactive Evaluation and Bias Discovery Platform for Speech-to-Speech Conversational AI

    SH Bokkahalli Satish, GE Henter, É Székely.

    Interspeech 2025 Show and Tell/Demo · Published

  10. 2021

    Predicting Lexical Skills from Oral Reading with Acoustic Measures

    SH Bokkahalli Satish, C Vitthal, K Sabu, P Rao.

    arXiv:2112.00635

Background

Education and Experience

  1. PhD student

    Researching enchronic speech dynamics using self-supervised learning and bias discovery/mitigation in conversational AI. Supervised by Éva Székely and Gustav Eje Henter at KTH Royal Institute of Technology, with support from WASP.

  2. Speech and AI systems

    Worked on speech assessment systems, voice activity detection, and LLM/RAG solutions for aerospace repair-scheme creation.

  3. MSc in Data Science & AI

    Deepened work across machine learning, AI, and applied data-driven systems.

  4. Master's in Electrical Engineering, Signal Processing

    Built ASR models for literacy assessments in India, later used in the field by an NGO.

Interests

Research Interests

  • Signal Processing
  • Speech Technology
  • Conversational AI
  • Machine Learning
  • Self-Supervised Learning
  • Responsible AI
  • SpeechLLMs
  • Voice Conversion
  • Bias Evaluation
  • Bias and Fairness
  • Deepfake Robustness

Contact

Contact

I am happy to hear from people interested in speech technology, conversational AI, representation learning, or responsible evaluation.