4 key reasons why open-source data fails AI in healthcare

IMO Health’s VP of Corporate Strategy discusses the shortcomings of open-source data in healthcare AI at HLTH 2024. Get the full scoop here.
Andrei Naeymi-Rad HLTH Tech Talk

At the 2024 HLTH conference, Andrei Naeymi-Rad, VP of Corporate Strategy at IMO Health, delivered an insightful talk on the pitfalls of relying on open-source data standards like the Unified Medical Language System (UMLS) and Observational Medical Outcomes Partnership (OMOP) for artificial intelligence (AI) in healthcare. His overarching message was clear: “Good enough is not good in healthcare.” 

Watch Naeymi-Rad’s full presentation here:

In a rush? Keep scrolling for four key takeaways that every healthcare leader should consider when addressing data quality challenges.  

1. Open-source standards struggle with clinical nuance

Open-source terminologies like UMLS often rely on synonymy architecture and basic mappings, both of which fail to capture the nuances of clinical language.  

“If I use an acronym like MI, that probably means Myocardial Infarction, right? But if I’m a pediatrician, MI can also mean Mitral Incompetence,” Naeymi-Rad said. “If I’m just using an open-source reference library operation to be the backbone of my data operations, this is a good example of where acronyms, eponyms, abbreviations will break that architecture and break it quite often.” 

The inability of an AI model to understand such context can lead to errors in clinical documentation, billing, and overall model performance.  

2. Coding inaccuracies impact billing and patient care 

When physicians use standardized code systems like ICD-10-CM or SNOMED® by themselves, they often miss the specifity needed to accurately capture complex patient conditions. For example, a statement describing type 2 diabetes mellitus with stage 3B chronic kidney disease, with long-term current use of insulin use, cannot be captured fully in ICD-10.  

“The only way to truly understand the full statement at scale is to make sure you’re taking the entirety of the statement as an understanding,” Naeymi-Rad said. “This is really, really important when you get into ambient documentation or understanding complex clinical conditions in multiple different areas of a statement.”  

Misrepresented data not only affects reimbursement but also compromises care prioritization and population health insights. 

3. Crosswalks perpetuate coding errors, jeopardize patient safety 

Crosswalks, which connect disparate terminology systems, result in miscoded and under-coded representations of patient populations. Even when combined with a skilled large language model (LLM), crosswalks fail to capture the appropriate level of specificity required for an accurate diagnosis statement.  

“If you’re not coding data appropriately at the front end, and you’re using crosswalks, and you’re under coding and miscoding your patient populations, you’re actually not providing the correct treatment protocols for your patients directly,” Naeymi-Rad said. “That can have an impact on decision support triggers.”  

Ultimately, crosswalks can create a data governance and data fidelity problem for large institutions and organizations that are trying to understand the criticality of their patient populations better.  

4. CDI and population groups can’t remedy underlying data quality issues  

While Clinical Documentation Improvement (CDI) specialists and population groups can help mitigate data issues, they are not an effective solution to poor foundational data quality.  

“We all know these are challenges—but still we lean back on the open-source terminology services that are out there to fill in these gaps because we feel like it’s good enough.” 

Relying on open-source data may seem cost-effective, but the financial and clinical impacts over time are far too great. At the end of the day, open-source terminologies lack specificity and context, resulting in workflow inefficiencies, loss of revenue, subpar patient care, and more.  

Addressing data quality at the source is essential. See how IMO Health fits into the equation by contacting us at sales@imohealth.com or 847-272-1242 to chat with a team member today.  

For a complimentary data quality assessment, click here.  

SNOMED and SNOMED CT® are registered trademarks of SNOMED International.

Interested in more IMO Health resources?

Sign up today and have resources delivered straight to your inbox.

Latest Resources​

Explore how IMO Clinical AI bridges the gap between classical ML and agentic AI, offering solutions that meet varying AI adoption levels.
Learn how IMO Health experts leverage the medical problem list to enhance HCC data capture, simplify risk adjustment, and support value-based care.
Article
Temps are tanking, string lights are shining, festive foods are flowing—holiday season is here. Let’s hope you avoid these 12 ICD-10-CM codes.

For award-winning solutions in healthcare IT and data analytics, you're in the right place.