The Public Health Problem and Local Context. Our intervention used state-of-the-art machine learning to tackle a core challenge in COVID-19 response: language barriers in contact tracing. Throughout the COVID-19 pandemic, many jurisdictions have rapidly scaled up contact-tracing efforts to contain the spread of the disease. Given the disproportionate impact COVID-19 has had on immigrant and minority communities, it has been vital to ensure that contact tracing works effectively and equitably across different segments of the population. Language and cultural barriers may contribute to disease disparities if the same quality of care and contact tracing cannot be delivered to vulnerable communities.
COVID-19 has highlighted the structural, language, and cultural barriers Latinx communities face when trying to comply with policies such as shelter-in-place orders. The disparity in impact is stark—38.9% of California’s population is Latinx, yet Latinx individuals make up 55.6% of COVID-19 cases and 46.6% of COVID-19 deaths. A variety of factors likely contribute to this acute disparity, including increased employment in essential industries, lower likelihood of having health insurance, higher-density living conditions, mistrust of public health authorities, and insufficient information and resources in their preferred language. The outsized risk these communities face makes it all the more critical that contact tracers are able to reach these communities, interrupt new chains of infections, and provide supportive resources during isolation and quarantine.
Trust and rapport are critical for effective contact tracing, as interviews must cover sensitive and private topics to identify contacts. These calls are also important for informing policymaking and case investigation, as the initial case reports from laboratories are sparse in information. However, patients in minority communities may be especially fearful of phone calls from the government, unwilling to share personal information about themselves or their networks, or worried that disclosing their information could affect their employment or immigration status. Contact tracers with local language fluency and cultural competency can build greater trust among minority groups and immigrant communities, engage with patients in their preferred language, and dispel myths and misinformation. These language differences can be addressed by providing contact-tracing interviews in a patient’s preferred language and ideally via someone with specific knowledge of local experiences of the same communities.
Santa Clara County, CA, home to the city of San Jose, is a county with 1.9 million residents. It was the first US jurisdiction to issue a shelter-in-place order, in coordination with five other Bay Area counties. PHD invested significant efforts in contact tracing, including, at its peak, nearly 1,000 contact tracers. In August 2020, only about 60 of these were bilingual Spanish-speaking contact tracers. Cases with language needs far outstripped the bilingual capacity of the Case Investigation and Contact Tracing (CICT) team. Due to the enormous time pressure to reach cases and sparsity of language information in laboratory reports, the assignment of cases to contact tracers did not account for patients’ language needs at baseline, dramatically underutilizing the language skillset. All contact tracers had the option of using a telephonic interpretation service provided by the state that connects callers with “qualified” interpreters for simultaneous interpretation. Numerous challenges surfaced in the process, including technical issues (e.g., dropped calls, poor audio quality, delays), longer holds and wait times for Spanish interpreters in particular, and questions about the mismatch between general interpreters and specific COVID-19 needs, disrupting the effectiveness of contact tracing with language minorities and the Spanish-speaking population in particular.
Innovativeness of Practice. In this project, PHD partnered with Stanford University to pilot, implement, and demonstrate how machine learning can leverage administrative data to accurately predict whether an individual primarily speaks Spanish and match these incoming cases in real-time with individual contact tracers who speak Spanish. The team rapidly developed the model, embedded it into real-time operation, and evaluated its effectiveness in a randomized controlled trial.
This practice was highly innovative. First, while language barriers have been documented across public health, health care, and public services, we are not aware of any attempts to leverage machine learning to empower individuals with bilingual skill sets to improve health equity in contact tracing. The inputs from laboratory reports were sparse, including only age, name, and address. The Stanford team merged this information with census data that contain Spanish-speaking information at the census block group (CBG) level, commercial data that contain language information with demographic correlates, and name-based race and ethnicity information from census and mortgage data. The team then developed an interpretable, but powerful, model to predict Spanish-speaking status. The scores were then used to identify patients who are in most need of a bilingual contact tracer, allowing for more efficient allocation of the relatively low supply of bilingual staff.
Second, PHD evaluated the effects of utilizing this model in a randomized controlled trial of language matching with SCC’s actual contact-tracing process over a 2-month period. During this period, CICT integrated our risk-scoring algorithm into their contact-tracing system and assigned patients with higher risk scores of being Spanish speakers to a language specialty team (LST) composed primarily of Spanish-speaking contact tracers. The team tracked outcomes in real time using data recorded in the contact-tracing system, and conducted a survey of the majority of CICT members on their experiences with language discordance. The intervention demonstrated substantial improvement from bilingual contact tracing compared to simultaneous telephonic interpretation. Language matching resulted in (1)) significant time savings, shortening the time from opening of cases to completion of the initial interview by nearly 14 hours and increasing same-day completion by 12%, and (2) improved community engagement, reducing the refusal to interview by 4%.
Based on the results and success of this trial, PHD expanded language matching to all of CICT, and the state of California is contemplating adoption in the statewide system.
Evidence Base. The intervention was designed with extensive consideration of the academic literature, referenced in the PNAS publication (Lu et al. 2021. “A Language Matching Model to Improve Equity and Efficiency of COVID-19 Contact Tracing.” PNAS.).