Hear Me Out

Interactive evaluation and bias discovery platform for speech-to-speech conversational AI

Shree Harsha Bokkahalli Satish, Gustav Eje Henter, Éva Székely

KTH Royal Institute of Technology, Stockholm, Sweden

🎙️ Click here to try Hear Me Out Live

Hear Me Out is an interactive evaluation and bias discovery platform for speech-to-speech conversational AI. Speech-to-speech models process spoken language directly from audio, without first converting it to text. They promise more natural, expressive, and emotionally aware interactions by retaining prosody, intonation, and other vocal cues throughout the conversation.

1. Clone the Repository

First, you’ll need to get a copy of this project on your local machine. Open a terminal and run:

git clone https://github.com/shreeharsha-bs/Hear-Me-Out.git
cd Hear-Me-Out

2. Set Up Your Development Environment

Requirements

modal installed in your current Python virtual environment (pip install modal)
A Modal account (modal setup)
A Modal token set up in your environment (modal token new)

Setting up Voice Conversion (seed-VC)

The voice conversion functionality uses the seed-VC library. To set this up:

Install the required dependencies for the local voice conversion server:
```
pip install -r local_server_requirements.txt
```
Start the local voice conversion server in one terminal:
```
python local_vc_server.py
```
In another terminal, start the Modal development server:
```
modal serve -m src.app
```

This workflow allows the application to use local voice conversion capabilities (which run on your machine) while serving the main application through Modal.

While the modal serve process is running, changes to any of the project files will be automatically applied. Ctrl+C will stop the app.

Note that for frontend changes, the browser cache may need to be cleared. Or better yet, use incognito mode for every run.

If you want to deploy the app look at the instructions on Modal. You also get 30$ of free credits from them for now. You can deploy completely locally but that would require some changes to the code.

Features

Hear Me Out enables users to experience interactions with conversational models in ways that aren’t typically accessible with regular benchmarking systems. Key features include:

🎤 Speech-to-Speech Models: Users can choose from a variety of models that retain vocal cues like prosody and intonation.
🔄 Real-Time Voice Conversion: Step into someone else’s voice – literally – and investigate how conversational AI systems interpret and respond to various speaker identities and expressions.
⚖️ Side-by-Side Comparisons: Ask a question with your own voice, then re-ask using a transformed voice. Compare the AI’s responses to observe differences in tone, phrasing, or behavior.
📊 Insights Through Data: Visualize metrics like speech rate, sentiment analysis, and more.

Through this immersive experience, we hope users will gain insights into identity, voice, and AI behavior. Ultimately, we aim to surface meaningful questions and inspire future research that promotes fairness and inclusivity with Hear Me Out.

Demo Video

In the demo video, we explore the Moshi speech-to-speech model and its responses:

Example 1: Emotional Awareness

Notice how the model disambiguates between inputs with levity and frustration, correctly reflecting the speaker’s emotional state in its responses. This distinction adds a more human-like quality to the interaction.

Example 2: Voice Conversion - Gender Bias requesting unauthorized access

By applying voice transformations, we simulate how the model might respond to different speaker characteristics. While the differences in these responses are more subtle and inconsistent under repetition, hearing oneself in another voice opens up new perspectives.

Example 3: Voice Conversion - Gender Bias at Work

📄 License

This project is licensed under the terms specified in the LICENSE file.

🤝 Collaborations

We welcome contributions and collaboration. If you're in HCI, please reach out.

Explore Empathy and Conversational AI with Hear Me Out

🎙️ Try it now