In the first blog of this series, we covered the building blocks of IMO Clinical AI – terminology, technology, and people. The focus of this article is technology, especially the tools that aid in the construction and deployment of natural language processing (NLP) pipelines, a series of inter-connected steps that help convert text into a desired output for downstream analysis.1
The NLP development platform in IMO Clinical AI consists of an integrated development environment that manages the entire NLP pipeline development lifecycle.
Four steps to develop NLP pipelines/models
1. Data acquisition:
NLP, a sub-discipline of artificial intelligence (AI), is fundamentally about learning from free text examples and extracting meaning from such text by recognizing entities and relationships within free text narratives. Data acquisition is about obtaining textual data from various sources to aid in the creation of NLP pipelines. IMO Health has secure technical means and appropriate policies to acquire the requisite data.
2. Converting images and PDFs to text:
Free-text narratives in healthcare data reside largely in formats such as PDFs and images. The NLP development toolset uses various optical character recognition (OCR) methods to convert images and PDFs to text.
3. Integrated Development Environment (IDE) for model training and NLP pipeline construction:
The NLP development platform in IMO Clinical AI also consists of an IDE to provide the user with an intuitive user interface (UI) for the construction of NLP pipelines. The functions of the IDE can broadly be grouped into the following categories:
- Text pre-processing: The IDE provides easy access to pre-processing techniques like tokenization, lemmatization, part-of-speech (POS) tagging, and much more.
- Feature engineering: The IDE also provides ways to extract relevant features from raw text to make them available in a form that is conducive for training machine learning (ML) models.
- Model training: Once the features are prepared, the IDE offers various deep learning algorithms and frameworks to train models on the processed data. This step involves selecting appropriate algorithms and tuning hyperparameters to ensure it successfully trains a high perfoming model on the training data.
- NLP pipeline construction and evaluation: After pre-processing and feature engineering steps are complete, the IDE provides an easy to use and intuitive user interface that helps the user combine ML approaches, heuristic approaches, calls to external APIs, and much more to help finish the construction of a functioning NLP pipeline. The IDE also has tools to evaluate and measure the pipeline’s performance.
4. Deployment:
After the NLP pipeline has been constructed and evaluated for its performance, the NLP development toolset provides a variety of methods to deploy the pipeline so that it can be easily accessible to third-party applications.
Development of NLP pipelines needs to be understood as having a lifecycle of its own. IMO Clinical AI endeavors to provide NLP architects, data analysts, and clinical subject matter experts an easy to use and integrated development environment that eases the journey through the lifecycle – helping customers develop and deploy NLP pipelines in service of their business objectives.