---------------------------------------------
# Lessons Learned from a Secondary Analysis Using Natural Language Processing and Machine Learning from a Lifestyle Intervention

Preferred citation (DataCite format):

  Freylersythe, Sarah; Sharp, Rebecca; Culnan, John; Romero Diaz, Damian Yukio; Zhao, Yiyun; Franks, Hagan; et al. (2022).
  Lessons Learned from a Secondary Analysis Using Natural Language Processing and Machine Learning from a Lifestyle Intervention.
  University of Arizona Research Data Repository.
  Poster. https://doi.org/10.25422/azu.data.19576069


Corresponding Author:
  Steven Bethard, School of Information, bethard@email.arizona.edu


License:
  CC BY 4.0


DOI:
  https://doi.org/10.25422/azu.data.19576069


---------------------------------------------
## Summary

This poster was presented on 2022 April 6 – 9 at the 43rd Annual Meeting &
Scientific Sessions of the Society of Behavioral Medicine in Baltimore, MD,
USA.(https://www.sbm.org/meetings/2022)

We provide the poster in several formats, including svg, pptx, png, pdf, and
jpg. We also provide two figures: the "iceberg figure" that illustrates the
depth of the untapped data from the original LIvES study (iceberg_figure.png),
as well as the QR code that links to additional information such as references
(QR - Handout.png).

Submitted Abstract:

Background: Recorded telephone coaching sessions (approximately 24,500) in
English and Spanish from 1205 women participating in the Lifestyle
Intervention for oVarian cancer Enhanced Survival (LIvES), GOG 0225, study
were used for this analysis. The LIvES Study tested whether a lifestyle
intervention of increased physical activity and a healthy diet would increase
progression-free survival compared to an attention control using trained
health coaches and Motivational Interviewing (MI), a directive, patient-
centered counseling approach; 323 LIvES Study coaching session recordings were
scored for adherence to MI techniques. Here we describe lessons learned from a
secondary analysis of LIvES data utilizing machine learning and natural
language processing to automate fidelity and predict lifestyle behavioral
outcomes.

Methods: Numerous steps were necessary to prepare the call recordings for
natural language processing. Data were aligned through a combination of
participant phone numbers, coach names and participant names, entry dates and
recording dates. Transcription was performed automatically with wav2vec. An
annotation interface was developed using Label Studio and an annotation
guideline was adapted from existing Motivational Interviewing Treatment
Integrity (MITI) 3.0. Finally, a pilot annotation of the call recordings was
completed and initial inter-rater reliability was measured.

Results: The process of preparing this secondary analysis resulted in a
number of lessons learned. First, data infrastructure for the original LIvES
study, due to its long-running nature, evolved in ways that lost data
continuity. The data alignment process would have been simplified by
establishing a single identifier to link calls, outcomes, and MITI scores, and
maintaining that identifier over the course of the project. Second, evaluating
the quality of automated transcription systems is difficult and could have
been streamlined by manually transcribing a small number of study calls to be
used for evaluation. Finally, training a machine learning model to assess
interviewer turns could have been simplified by establishing a protocol for
coding MITI scoring using an annotation tool, resulting in turn-level
annotations alongside the holistic scoring typical of MITI.

Conclusion: Behavioral interventions should engage the support of a
computational scientist in the study design planning stage to take advantage
of the largely under-utilized data collected in these trials.


---------------------------------------------
## Files and Folders

- SBM 2022_NLP.jpg: Full poster in JPG image format
- SBM 2022_NLP.png: Full poster in PNG image format
- SBM 2022_NLP.pdf: Full poster in PDF file format
- SBM 2022_NLP.pptx: Full poster in Microsoft PowerPoint image format (editable)
- iceberg_figure.png: Iceberg figure used in the poster in PNG image format
- QR - Handout.png: QR code in PNG image format. This image links to our poster handout website. Used in the poster on the bottom left.


---------------------------------------------
## Additional Notes

Links:
  - https://github.com/clulab/SBM_2022_LIvES