Many healthcare organizations are looking to artificial intelligence (AI) tools to help move specific initiatives forward. From enhancing drug discovery to optimizing clinical trials to advancing disease phenotyping – there’s a lot of potential. However, for this potential to be realized, it’s crucial that AI solutions are able to understand the complex biomedical data these initiatives rely on. Without this foundational training, machines will struggle to interpret complex medical concepts accurately.
This is where specialized natural language processing (NLP) models and biomedical domain expertise is required.
Our latest white paper, NLP and generative AI in life sciences and precision medicine, explores the nuances of four key applications of this technology, demonstrating how sophisticated NLP models and deep biomedical knowledge are essential for extracting meaningful insights.
Only have time for an excerpt? Continue reading below. Otherwise, click the button to download.
Disease phenotyping and precision medicine
Understanding the patient journey for a particular disease is a powerful way to help assess the benefits, harms, and trajectory of medical treatments, deliver precision medicine, and improve outcomes. Massive amounts of detailed patient data are available in EHRs to support this analysis, but must be accurate, structured, and well-organized to unlock its value. Further, conducting meaningful research on patient populations requires assembling patient cohorts with similar disease and treatment profiles.
Challenges to creating and assessing patient journeys start with inconsistencies in EHR data and real-world evidence. With inconsistent implementation of interoperability standards, data becomes highly variable and lacks specificity, creating gaps that require a great deal of manual
intervention and can drain clinical resources. In addition, free text data in clinical notes including pathology, radiology, and radiation therapy reports often require deep domain expertise to extract meaning
Generative AI and NLP for disease phenotyping and precision medicine
NLP and generative AI are ideal tools to extract clinical information from EHRs, however, general healthcare NLP models often fall short. To be effective and keep up with a knowledge base that grows and changes rapidly, NLP solutions must be trained on data that is optimized in medical domains. Models must understand and extract all possible genes, diseases, variants, and mutation patterns, identify disease associations between phenotype and variant, as well as characterize rare diseases and variants. Solutions must be able to normalize concepts to standard ontologies such as MedDRA and Medical Subject Headings (MeSH) and use pattern recognition to identify complicated categories of information such as diseases and symptoms caused by various gene-protein mutations.
With specialized clinical and biomedical NLP and generative AI, domain experts can use prompt engineering to extract insights from clinical notes and specialized reports, finding patient characteristics on disease progression, trajectory, treatment, and procedures, as well as extract dates associated with each. Understanding the journey across a population of patients is more complicated. It requires compiling and aligning data across multiple patients to create real-world evidence journeys. To compare disease and treatment progression, every patient needs a clear, age-based timeline from birth through each key milestone – the disease, symptoms, and conditions. With aligned timelines, researchers can evaluate progression and outcomes, and trigger screening and biomarker testing for individual patients.
An IMO Health example:
Extracting crucial treatment details from radiation oncology-specific EHRs
In a recent study, IMO Health (Melax Tech) scientists used NLP to extract free-text data from radiation oncology-specific EHRs, developing customizable modules for cancer-related information in pathology reports including tumor size, tumor stage, and biomarkers. Based on data elements suggested by the College of American Pathologists, the study used 400 randomly selected pathology reports from cancer patients. For named entity recognition, it implemented regular expression-based, dictionary lookup-based, as well as machine learning-based approaches. For relation extraction, it developed rule-based, machine learning, and hybrid approaches. When evaluated against existing systems, the customized NLP pipeline achieved comparable performance with reduced production time and greater adaptability.