AI can use your voice to detect depression

AI can use your voice to detect depression

“Prevention is the key.” Wise words often spoken by medical practitioners. There is little argument that the most beneficial results of treatment often result from having “catch” a disease in its early stages. For example, if we experience constant knee pain when running, it is best to see a doctor early because continuing to run can cause further damage, which may lead to surgery.

Mental health issues are no different, but can be harder to “catch”. Depression can creep; we may start to feel tired, less motivated or irritable. Often we try to push through it, blaming other factors such as stress, weather, or other medical issues until the effects are severe enough that we need professional assistance. By the time we get to this point, depression can be harder to deal with. Maybe we struggled for weeks, sometimes even years.

Behavioral patterns and consistency are what the human brain excels at. But we can also develop maladaptive patterns, and breaking these coherences after years of reinforcement poses quite a challenge. However, what if there was another way to detect early signs and symptoms of depression using only the human voice?

Current methods of screening for depression are often subjective and consist of questionnaires, self-reports or behavioral observations. Even some empirically validated psychological batteries have a subjective bias. This leads to possible “yes” or “no” interpretations (eg, individuals may exaggerate or minimize their symptoms). Additionally, individuals may consciously ignore the severity of their symptoms. When asked, “How is your appetite?” a client may report that they eat three meals a day, which is considered “normal”, but do not report or are unaware that the amount they are eating is significantly less than before. Most skilled clinicians are trained not only to ask appropriate follow-up questions, but also to assess behavioral cues, including body positioning, eye contact, mannerisms, and voice.

Speech biomarkers

A client’s speech is an important part of what is called a “mental state examination” which is performed as part of a psychological evaluation. Clients are observed on their speech/voice pitch, volume, cadence, flow, rhythm, rhythm, pitch, etc. These markers are important descriptors when assessing levels of depression. Since a clinician has to filter a significant amount of information in a short time, a lot of subtle or secret information can also be missed. As such, companies like Kintsugi have developed AI voice biomarkers that they claim can detect depression with 80% accuracy, compared to around 50% accuracy for a human clinician. What’s more impressive is that they claim that all of this can be done with just a few seconds of voice clip.

Irina Vodneva/iStock

Credit: Irina Vodneva/iStock

Use of artificial intelligence

The process is simple. A customer submits a voice clip of a few seconds. The emphasis is not on the words spoken but on how they are pronounced.

According to David Liu, CEO of Sonde Health, “By processing this audio, we can break down a few seconds of voice recording into a signal with thousands of unique characteristics”, a process called audio signal processing. This data then allows scientists to map voice characteristics, sounds, structures, or simply “biomarkers” that correlate with certain diseases or conditions. The Sonde Health team uses six biomarkers that assess tiny changes in voice pitch, inflection, or dynamics. Certain levels of scores on these changes are correlated with the severity of depression. Clinicians can then use this data to begin formulating treatment plans earlier or refer to other services.

AI and postpartum depression

One interesting area of ​​this AI pursuit is the possible detection of postpartum depression. Currently, it is estimated that about 50% of women struggle with the “baby blues”, but an additional 20-30% develop a more severe form of depression (Illinois Department of Public Health) which may require medication. For some, it may even mean pursuing higher levels of care, such as hospitalization, if symptoms affect functioning.

Spora Health uses AI to facilitate health equity-focused screenings. In their fully virtual program, when a patient calls and begins speaking with a clinician, Kintsugi’s AI begins listening and analyzing the voice. After about 20 seconds of listening, the AI ​​software can generate a patient’s PHQ-9 and GAD-7, screening assessments that clinicians use to determine levels of anxiety and depression. This information is used to create the most appropriate treatment plans, provide referral services when needed, discuss medications when appropriate, or sometimes just keep a “closer eye” on a patient.


As interesting and advanced as this technology is, some are concerned about accuracy and/or invasion of privacy. Although Kintsugi claims that its AI technology predicts with 80% accuracy, how would this translate to different cultures, languages ​​or personality differences? Moreover, how would this translate into differential diagnoses? Does this also tip the line of invasion of privacy by having voice clips of patients? Kintsugi promises complete patient privacy and HIPAA compliance, and their ongoing research and pursuits are noteworthy. As AI continues to advance, Kintsugi’s AI software is something to watch, not only in the area of ​​mental health, but for other medical conditions as well.

#voice #detect #depression

Leave a Comment

Your email address will not be published. Required fields are marked *