Total spending on AI-related drug discovery and development tools is expected to hit $1.3 billion in 2022, according to Boston Consulting Group. These are massive numbers and, while true that research and discovery are a key part of the life sciences and pharmaceuticals value chain, data science, machine learning, and AI can play a valuable role across its entirety.
From R&D and clinical development to manufacturing and supply chain efforts to market launch and marketing use cases, the ability for these technologies and techniques to transform the industry as a whole is palpable. It’s not turnkey, though — organizations need a cross-organizational approach to driving collaboration and extracting business value from a people, process, and technology perspective.
Beyond key predictive analytics use cases (think early disease identification, loss of exclusivity, prescription optimization, and potential patient identification), pharmaceutical teams can enhance processes across other facets of the business by using data in an intelligent way. For example, demand forecasting can be used for both general availability (what can be purchased at the pharmacy) and for managing clinical trials, enabling companies to distribute their products more efficiently. Further, marketing efforts can be optimized, such as using a predictive model that scores target customers for their propensity to open an email.
In addition to a trove of use cases that pharmaceutical companies can begin (and that can have impacts across the value chain), it’s critical for pharmaceutical companies to have a sound data governance strategy in place in order to remain compliant with regulatory requirements. In addition to the work we are doing to help multiple global pharmaceutical customers meet and maintain GxP compliance while using Dataiku DSS in production with medical data, Dataiku can help data teams at pharmaceutical companies understand their models and scale their machine learning efforts. Here are a few Dataiku features that enable this:
1. Interactive visual statistics: An interactive statistics worksheet in Dataiku provides a dedicated interface for performing exploratory data analysis (EDA) on datasets. Team members can summarize or describe data samples, draw conclusions from a sample dataset about a specific patient population, or visualize the structure of the dataset in a reduced number of dimensions.
2. Subpopulation analysis: This indicates whether a model is biased towards a particular population, which can be particularly useful when exploring clinical trials or patient responses to new drugs. If a certain chemical agent works well for some subpopulations, and poorly for others, it may require modifications before it can be released to diverse patient groups.
3. Individual model prediction explanations: Organizations can effectively debug black-box models for accuracy and bias by describing which characteristics or features have the greatest impact on a model’s outcomes. These row-level explanations for why a model is producing a given prediction can be obtained through APIs and generated for both models built from scratch and AutoML.
The mass amount of data available in pharmaceuticals — such as a healthcare database with patient and provider data from more than 300 million U.S. patients, for example — may incite hesitancy amongst pharmaceutical organizations to wholly embrace data science and AI, whether due to the highly regulated environment they exist in, a lack of awareness of tools that exist to transform complex and disparate datasets and use them for applications such as predictive modeling, or both. Simultaneously, though, this is powerful evidence of the potential the industry has to use data to unlock efficiencies, automate processes, and expedite processes from drug discovery to launch.