Fitbit Data Streams Enabled Accurate Prediction of Mood Shifts in Patients with Bipolar Disorder

Fitbit Data Streams Enabled Accurate Prediction of Mood Shifts in Patients with Bipolar Disorder

Posted: June 12, 2025
Fitbit Data Streams Enabled Accurate Prediction of Mood Shifts in Patients with Bipolar Disorder

Story highlights

In a preliminary real-world test with patients diagnosed with bipolar disorder, data from wearable Fitbit devices (worn like a watch) generated predictions with 80%-89% accuracy about impending shifts in mood that were accurate enough to meaningfully inform treatment.

 

Advances in personal digital devices and technologies continue to be translated into potential aids to diagnosis and treatment of psychiatric disorders. A recent example has been reported by researchers who tested the idea of using information collected by wearable Fitbit devices to predict mood shifts in individuals diagnosed with bipolar disorder (BD).

BD involves fluctuations in in mood and energy that can result in severe cognitive and functional impairment. High-energy moods (mania or less intense hypomania) can be treated with lithium or anticonvulsants. Depressive moods are treated with a variety of medications. Patterns in moods can vary greatly from patient to patient, and can change in individuals over time. A majority of patients experience changes in symptom severity or polarity several times a year; higher rates of mood episode recurrence have been associated with poorer cognitive and functional trajectories.

Last year, 2022 BBRF Young Investigator Sarah Sperry, Ph.D., and colleagues, published evidence calling into question the commonplace assumption in clinical medicine that periods between low and high mood in bipolar disorder are ones of “normal” mood. Their finding of considerable “mood instability” in many patients between episodes of depression and mania/hypomania has the potential to improve care and quality of life by directing attention to these periods “between” major mood episodes.

Recently, a team led by 2019 BBRF Young Investigator Jessica M. Lipschitz, Ph.D., of Brigham and Women’s Hospital, set out to test whether data from wearable Fitbit devices (worn like a watch) could generate predictions about shifts in mood that are accurate enough to “meaningfully inform treatment.” Detecting these shifts could trigger a doctor’s appointment when one is not otherwise scheduled—a way of potentially lessening the magnitude or consequences of an important shift in mood. Katherine E. Burdick, Ph.D., winner of the BBRF Colvin Prize in 2021 and a two-time BBRF grantee, was senior member of the team, which reported results in the journal Acta Psychiatrica Scandanavia.

Called “digital phenotyping,” the moment-by-moment measurement of data such as sleep patterns, daily activity, and heart rate, collected by wearable digital devices, offer a potential avenue for early detection of mood episodes in BD patients, the researchers postulated. A number of prior studies making use of such data streams have utilized algorithms that require participants to complete frequent self-report questionnaires, which increases the burden on patients “and therefore reduces the feasibility of applying these methods in real-world clinical settings,” the team wrote. Other potential impediments to patient compliance in past research studies include collecting invasive data streams such call/text message logs, voice or keyboard interaction, and geolocation data. Such features raise privacy concerns. Finally, some previously tested wearable devices tend to be cumbersome or uncomfortable (e.g., large and bulky devices, requiring chest straps) or are more expensive than many patients can afford.

Taking all of these factors into consideration, Dr. Lipschitz and team selected the Fitbit device for their study: commercially available and inexpensive (although those used in the study were provided by the investigators) and most importantly, entirely passive: users do not input data of any kind. The devices have also proven over the years to be quite accurate and consistent in registering user data.

Participants in the Fitbit study had all been previously enrolled in a larger, multiyear study, which is still ongoing, of cognitive and psychosocial functioning in BD patients. All had been diagnosed with Bipolar I (depression and at least one manic episode) or Bipolar II (depression and at least one hypomanic episode). Those included in the statistical analysis had to have completed at least 24 weeks of a 9-month Fitbit data monitoring period. Eleven of these 65 participants were not included in the analysis (2 dropped out; others had issues relating to their data), leaving 54 adults in the final cohort.

Although the Fitbit data is acquired passively, those in the final cohort also completed a self-report questionnaire given every 2 weeks during the 9 months, indicating any depression and/or manic/hypomanic symptoms. Data from these questionnaires were used to determine the accuracy of machine learning algorithms used to predict mood symptoms, not as input for the algorithms. That is, these self-report questionnaires would not be included in a “real-world” deployment of this or similar monitoring protocols. The team identified one algorithm among the several tested that generated superior predictive results.

Data collected on each user by Fitbit devices covered such facets as number of daily steps taken; number of minutes spent in non-active or “sedentary” mode; heart rate (daily average and resting average); total sleep time; amounts of time and number of nightly intervals spent in deep sleep and REM sleep; number and duration of nightly awakenings; and bedtime.

A machine learning algorithm called “BiMM forest,” after filtering out data from 11 of the original 65 participants (17%) who either dropped out or were not sufficiently compliant with the protocol, was able to use the Fitbit data to accurately predict clinically significant depression with 80.1% accuracy, and clinically significant manic/hypomanic symptoms with 89.1% accuracy. One other past test of Fitbit for this purpose yielded similar accuracy, but only after filtering out 46% of the data, meaning that it served its predictive purpose mainly in participants who were highly compliant with the protocol, and may not work as well in a general sample of BD patients seeking treatment. The accuracy numbers for this trial were calculated after the fact, looking back on all the data.

“Our findings are particularly noteworthy,” the team wrote, “because all input was passively collected; none of the metrics utilized were invasive in terms of privacy; we used mainstream consumer devices; and our methods did not demand high levels of Fitbit compliance.” For these reasons, the team suggests its results and methods “are an important next step toward a digital phenotyping approach that could be feasibly and broadly implemented across BD patients in routine care.”

Follow-ups will need to further evaluate replicability as well as acceptability and feasibility in actual routine clinical care settings, the team said. A more diverse patient sample also needs to be tested, as most of the final cohort for this study had a Bipolar I diagnosis.

The team noted which variables recorded by the Fitbit device proved most important in predictive accuracy. For both depression and (hypo)mania symptoms, heart rate data and median bedtime were important, in relative terms. Interestingly, average daily heart rate was more important for predicting (hypo)mania, while resting heartrate was more important in predicting depression. Deep-sleep variables proved important in predicting depression, while REM sleep variables were especially important for (hypo)mania symptom prediction. Such information from the trial, said the team, begins “to open the black box of machine learning,” as applied to biometric prediction.