Optimizing efficiency gaps and TCO of predictive modeling initiatives

  • JJ Jagannathan

April 10, 2015

We all agree that predictive modeling is a powerful tool that can help us make better business decisions in every segment of the P&C insurance value chain. Daily we come across all kinds of industry statistics on how P&C carriers are planning to increase their investment in predictive analytics and where they are planning to invest. The ‘how we do it’ is equally important as the ‘what we do’. The efficiency of how we run a predictive analytics project matters more than saying “yes, we do predictive analytics because it is a recommended industry best practice”. When I stepped back to think about how we are running predictive modeling initiatives today, I realized that there are plenty of things we can do to reduce the total cost of ownership and improve the impact on business decisions. Over the last several years, I have witnessed multiple P&C insurance predictive analytics projects suffer from the following efficiency gaps:

  1. Time to get modeling data: The time it takes for the modeling teams to pull policy, claims, billing, and external data from the core systems and data warehouses to build the version 1 model far exceeds the time that the teams actually spend on data preparation and model development work. The inefficiencies around pulling partial data, wrong data, and data from systems with completely different data formats can impact the quality of the model and the time to create the initial model. Most teams spend at least two to three months just extracting the raw transaction data. The delays in extracting data force teams to spend very little time on the more important areas of validating target data availability, understanding sample bias, and identifying good data surrogates.

  2. Time to deploy the initial model: After the teams overcome the data challenge and even if the modeling teams are fast in developing a version 1 candidate model that is ready for deployment, the next biggest delay the modeling teams face is from the long wait time it takes to get access to IT resources. You might have the most sophisticated algorithm, but if you cannot operationalize your model quickly, it delivers no benefit to the organization. After teams get the necessary IT resources aligned, it typically takes about 4 months to deploy the initial model. Whenever predictive models get deployed in a model hosting application silo that is completely disconnected from the core systems, the models are not taking full advantage of the real-time data stream and the opportunity to embed analytics directly into the business workflow to make better decisions.

The modeling teams usually develop multiple models and test them against a holdout dataset to select the one model that performs best and is easiest to implement. Because of all the process constraints around model deployment, the teams are forced to choose just one model for the initial rollout. This approach of banking on one model for the initial rollout limits the probability of success.

You can also find stray models within the organization that never get deployed and integrated into the regular business workflow, but are considered to be in ‘production’ and are run daily from an expert’s personal desktop. This behavior is primarily driven by the process hassles around model deployment and internal system limitations.

  1. Time to enhance the model: After the version 1 model has been in production for some time and based on the model’s performance in the field, the project teams will decide to enhance the model using gap data (the data that is needed for refreshing or rebuilding the model) additional external datasets, adjust cut points, overlay new business rules, or adjust model coefficients. Next, the project teams have to go through one more round of delays to get IT resources before deploying the enhanced version 2 model. Typically the model refresh, enhancement, and redeployment efforts can take about 6-8 months to complete, but plenty of carriers choose to do this once a year.

  2. Time to learn model performance results: In most cases, the business leadership and modeling teams don’t have a view into the model’s performance in the field for several months, and in some cases, even up to a year, and I am not talking about rating models here. Modeling teams try to share as much information as possible with the business teams, but often they themselves get stuck waiting to get access to the operations data. The approach of rolling out one model during the initial deployment and waiting for answers extends your learning curve.

Moving from Predictive Analytics 1.0 to Predictive Analytics 2.0

Compressing the time-to-action on the four areas of a predictive modeling project that were outlined above is critical to drive success in terms of lowering the total cost of delivering a predictive analytics project and the ability to make better business decisions.

Here are some options to consider:

  • Look at secure and reliable public or private cloud-hosted analytics platforms that can be integrated into your core systems and data warehouses to overcome data extraction and model deployment delays. An instant-on predictive analytics capability is required to effectively compete in the market and your IT department leaders will be able to help you select the right solution architecture.

  • Deploy the predictive models closely integrated with your core systems to take advantage of the real-time data stream in the core systems and make your analytics more actionable. The ability to have the right data at the right time is critical and embedding predictive analytics models within the core systems can get you closer to the optimized state.

  • Eliminate stray models that are run from an expert’s personal desktop for production usage and have them integrated with the mainstream business workflow. Stray models are single points of failure and wrong business decisions driven by human errors can cause significant financial impact and unpleasant customer experience.

  • Create a data-driven testing culture and the ability to vet multiple models at the same time in limited pilot rollouts to identify the best model for wider production usage.

  • Democratize the information on model performance results so all the stakeholders have a full view of the predictive analytics program’s performance. Failing fast and learning quickly is what we need here. It is much easier to track model performance results when it is integrated with the core systems workflow because accessing data and calculating the results is straightforward.

  • Include the right external datasets into your version 1 modeling dataset and don’t just rely on internal data and push external data evaluation to a later phase. Having the right external datasets can help boost your version 1 model’s performance and help create a competitive differentiator.

By no means is this a comprehensive list of recommendations, but this is a start. Fixing efficiency gaps in predictive analytics projects is more about ‘getting a solid foundation in place’ before we go after bigger business problems and more powerful analytic techniques. I welcome the readers to share what has worked well for you in the past and what we should be doing as an industry.

A version of this article previously appeared in ITA Pro Online Buyer's Guide.