With all the hype surrounding AI and speaker recognition, how can we combine the two to solve pressing social problems? No doubt, gender equality is an ongoing issue, and one way to tackle it is to measure how the opportunity to contribute in the workplace might be different for men and women. Through our explorations, we developed an iOS app that measures the relative contributions of men and women in a meeting by analyzing the time they spend speaking.
When the meeting starts, the app uses the mic to record what’s being said and will continuously show you the equality of that meeting. When the meeting has ended and the recording stops, you’ll get a full report of the meeting.
How does it work?
At a high level, our app records and analyzes speech during a meeting, and then provides a close-to-real-time visualization of speaker contribution by gender as the meeting progresses. While recording, the app sends audio chunks to a backend for classification. Once the server receives a chunk, it segments the audio by speaker and identifies who is speaking. It then returns a JSON array of the classified segments. Based on the response data, the app can easily create visualizations to highlight speaker contribution by gender.
We created a Node.js wrapper to serve as an HTTP API the app can use to send audio and receive the analyzed JSON results. We host the backend using Google Compute Engine.
The audio analysis involves two steps. First, we use an algorithm to detect each time the speaker changes. At these transitions, we create a new segment. Second, we take all the generated segments and classify each as belonging to a speaker. We use VoiceID, a system based on LIUM Speaker Diarization, for both diarization and classification.
Speaker Recognition and Gender Classification
A Gaussian Mixture Model (GMM) is used in order to recognize the speaker in each segment. We don’t need to go through a training phase to label speakers since our application aren’t concerned with the specific individual speaking, but rather their gender.
Recording and Visualizing in iOS
AVAudioRecorder in a custom class to record 30 second long segments and post to the backend. The server response is parsed on the client using SwiftyJSON and then the total time spent talking for each gender is calculated. Every time a new calculation is done we’re calling a delegate function that our
ViewController implements and then update the visualization.
Gender classification using speech is very tricky to get right and is still a topic of active research. However, when used for applications like this it’s not important to get it 100% correct to still provide value as a service.
Though this app is certainly not going to solve gender equality in the meeting room, merely leaving it running during meetings can heighten awareness of the issue. Our aim with this project is to demonstrate how solutions to wider problems can be approached using innovative technology. Given this is is the first step of a larger experiment, we would love to see what others can build on top of this work.