Machine Learning and Insurance

Machine learning applied to insurance data

For the insurance sector, we see machine learning as a fundamental game-changer since most insurance companies today are focused on three main objectives: improving compliance, improving cost structures and improving competitiveness. Machine learning can form at least part of the answer to all three.

Improving compliance: Today’s machine learning algorithms, techniques and technologies can be used to review, analyze and assess information in pictures, videos and voice conversations. One immediate benefit, for example, is the ability to better monitor and understand interactions between customers and sales agents in order to improve controls over mis-selling of products.

Improving cost structures: With a significant portion of an insurer’s cost structure devoted to human resources, any shift towards automation should deliver significant cost savings. Using machine learning insurers could cut their claims processing time down from a number of months to just a matter of minutes.

Improving competitiveness: While reduced cost structures and improved efficiency can lead to competitive advantage, there are many other ways that machine learning can give insurers the competitive edge, including product, service and process innovation.

Insurers overcome cultural differences to embrace artificial intelligence

Insurers have been slow to adopt machine learning in large part due to a culture of not being ‘early adopters’ of new technologies and approaches. This risk-averse culture also dampens the organization’s willingness to experiment and fail in its quest to uncover new approaches.

Insurance organizations also suffer from a cultural challenge common in information-intensive sectors: data hoarding. Fortunately, many companies are now keenly focused on moving towards a ‘data-driven’ culture that rewards information sharing and collaboration and discourages hoarding.

Starting small with machine learning

The first thing insurers should realize is that this is not an arms race. The winners will be the ones that take a measured and scientific approach to building up their machine learning capabilities and capacities and – over time – find new ways to incorporate machine learning into ever-more aspects of their business.

Insurers may want to start small. Our experience and research suggest that – given the cultural and risk challenges facing the insurance sector – insurers will want to start by developing a ‘proof of concept’ model that can safely be tested and adapted in a risk-free environment.

Recognizing that machines excel at routine tasks and that algorithms learn over time, insurers will want to focus their early ‘proof of concept’ efforts on those processes or assessments that are widely understood and add low value. The more decisions the machine makes and the more data it analyzes, the more prepared it will be to take on more complex tasks and decisions. Later, business leaders can start to think about developing the business case for industrialization, along with appropriate governance, monitoring and system management.

At KPMG, we have worked with a number of insurers to develop their ‘proof of concept’ machine learning strategies over the past year. We can say with absolute certainty that the battle of machines in the insurance sector has already started, and those that remain on the sidelines will suffer as they stand by and watch competitors find new ways to harness machines to drive increased efficiency and value.

Machine Learning & its impact on the future for Insurance

The interest in machine learning and the associated appetite to drive business outcomes from such investments continues to build. I’ve been talking to many insurance organisations over the past 18 months around machine learning and four consistent areas tend to arise as organisations grapple with the application and value of machine learning.

As 2017 gets well underway, I thought it prudent to share and gather opinion experiences in the insurance industry and I’ve also summarized these points of view as part of Louise Matthews’ ‘Five Minutes with….’ video series.

First and foremost, machine learning WILL change the way insurers do business. The insurance industry is founded on forecasting future events and estimating the value/impact of those events and has used established predictive modeling practices – especially in claims loss prediction and pricing – for some time now. With big data and new data sources such as sensors/telematics, external data sources (, digital (interactions), social and Web (sentiment), the opportunity to apply machine learning techniques has never been greater across new areas of insurance operations.

Machine Learning has now become an essential tool for insurers and it is used extensively across the core value chain to understand risk, claims and customer experience. Specifically, it is enabling insurance companies to yield higher predictive accuracy, as it can fit more flexible/complex models. As opposed to traditional statistical methods, machine learning takes advantage of the power of data analytics and is capable of computing seemingly unrelated datasets whether structured, semi-structured or unstructured.

By way of an example, predictive models based upon machine learning now take into consideration:

  • Structured data: type of loss, amount of loss, physician ID, etc.
  • Text: Notes, diaries, medical bills, accident reports, depositions, social data, invoices, etc.
  • Spatial, graph: accident location, work location, relationship of parties (physician, claimant, repair facilities), etc.
  • Time series: sequence of events/actions, claim date, accident date, duration between events/action, etc.

Now more than ever, insurers have the ability to evaluate mass amounts of underwriting/claims notes and diary (unstructured data), in addition to more standard documentation.

Pricing risk, estimating losses and monitoring fraud are critical areas that machine learning can support. Insurers have introduced machine learning algorithms primarily to handle risk similarity analytics, risk appetite and premium leakage. However, it is also widely used to aid the frequency/severity of claims, manage expenses, subrogation (general insurance), litigation and fraud.

One of the most impactful machine learning use cases is the ability to learn from audits of closed claims, as for the very first time leakage becomes controllable by the insurer. Claim audits are traditionally a manual process by nature, however, machine learning techniques provide an up-lift in the ability to learn from those by applying enhanced scoring and process methods throughout the claims lifecycle.

Those claim handling algorithms can be also used to monitoring and detecting fraud; however, one of the limiting factors may be the number of claims fraud cases/instances an insurance company has as the fraud datasets are fundamental for both traditional and machine learning models.

I’m often asked if machine learning can deliver a tangible decline in fraud rates and I do believe it can have an impact on earlier identification, or ‘counter-fraud’ techniques. The key element is to reduce the false positives and to apply machine learning algorithms to help determine which claims are potentially fraudulent vs. those that are legitimate.

Insurance companies applying this technique are reducing fraud in two aspects: earlier identification of the fraud and allocating resource time on the claim fraud investigation vs. spending on valid claims. This also increase customer satisfaction as valid claims are paid faster.

Nothing evidences the impact of any technology more than how it is applied in the real world and we are seeing those as relates to insurance fraud. Using machine learning, insurers can load claims data (whether structured, unstructured and semi-structured data) into a huge repository, often called “data lake”. This method differs from traditional predictive models which only leverage structured data. Claims notes, diaries and documents are key in discovering fraud and developing fraud models. In case of fraud detection, the procedure would consist on:

  • Learning Phase: where you are learning from “training data” or claims which are fraudulent and those which are valid. it consists on pre-processing (normalization, dimension reduction, image processing if you are using photos, aerial images etc), learning (supervised, unsupervised, minimization, etc.) and error analysis (precision, recall, overfitting, test/cross validation, etc.).
  • Prediction Phase: here one uses the model from the learning phase and apply it to new data and is deployed for detecting and flagging fraud.
  • Continuous Learning Phase: it is key to continuously recalibrate your models with new data and behaviors.

In addition to machine learning, the usage of Graph Analytics is also rapidly becoming popular because of its ability to visualise fraud patterns.

The usage of Graph Analytics with Apache Spark/GraphX is a newer method being leveraged as it enables the usage of neural network and social networks which is key in claims fraud analysis. This method is becoming quite popular vs. traditional claims scoring or business rules as these methods (considered a “flagging model”) and may result in too many false positives.

A Graph Analytics technique can help you understand the data relationships and is also used for investigating individual claims fraud cases. This method allows insurance companies to more quickly visualize fraud patterns vs. traditional scoring models.

Using machine learning for insurance pricing optimization

Wednesday, March 29, 2017

By Kaz Sato, Staff Developer Advocate, Google Cloud

AXA, the large global insurance company, has used machine learning in a POC to optimize pricing by predicting “large-loss” traffic accidents with 78% accuracy.

The TensorFlow machine-learning framework has been open source since just 2015, but in that relatively short time, its ecosystem has exploded in size, with more than 8,000 open source projects using its libraries to date. This increasing interest is also reflected by its growing role in all kinds of image-processing applications (with examples including skin cancer detection, diagnosis of diabetic eye disease and even sorting cucumbers), as well as natural-language processing ones such as language translation.

We’re also starting to see TensorFlow used to improve predictive data analytics for mainstream business use cases, such as price optimization. For example, in this post, I’ll describe why AXA, a large, global insurance company, built a POC using TensorFlow as a managed service on Google Cloud Machine Learning Engine for predicting “large-loss” car accidents involving its clients.

Understanding the use case

Approximately 7-10% of AXA’s customers cause a car accident every year. Most of them are small accidents involving insurance payments in the hundreds or thousands of dollars, but about 1% are so-called large-loss cases that require payouts over $10,000. As you might expect, it’s important for AXA adjusters to understand which clients are at higher risk for such cases in order to optimize the pricing of its policies.

Toward that goal, AXA’s R&D team in Japan has been researching the use of machine learning to predict if a driver may cause a large-loss case during the insurance period. Initially, the team had been focusing on a traditional machine-learning technique called Random Forest. Random Forest is a popular algorithm that uses multiple Decision Trees (such as possible reasons why a driver would cause a large-loss accident) for predictive modeling. Although Random Forest can be effective for certain applications, in AXA’s case, its prediction accuracy of less than 40% was inadequate.

In contrast, after developing an experimental deep learning (neural-network) model using TensorFlow via Cloud Machine Learning Engine, the team achieved 78% accuracy in its predictions. This improvement could give AXA a significant advantage for optimizing insurance cost and pricing, in addition to the possibility of creating new insurance services such as real-time pricing at point of sale. AXA is still at the early stages with this approach — architecting neural nets to make them transparent and easy to debug will take further development — but it’s a great demonstration of the promise of leveraging these breakthroughs.

How does it work?

AXA created a cool demo UI for the test bed. Let’s look at the details of its neural-network model that achieved the improvement.

AXA’s deep learning model demo UI

At the left side, you can see there are about 70 values as input features including the following.

  • Age range of the driver
  • Region of the driver’s address
  • Annual insurance premium range
  • Age range of the car

AXA entered these features into a single vector with 70 dimensions and put it into a deep learning model in the middle. The model is designed as a fully connected neural network with three hidden layers, with a ReLU as the activation function. AXA used data in Google Compute Engine to train the TensorFlow model, and Cloud Machine Learning Engine’s HyperTune feature to tune hyperparameters.

The following is the end result. The red line shows the accuracy rate with the deep learning model (78%).

Test results for POC

TensorFlow on business data

AXA’s case is one example of using machine learning for predictive analytics on business data. As another example, recently DeepMind used a machine-learning model to reduce the cost of Google data-center cooling by 40%. The team entered numerical values acquired from IoT sensors in Google data centers (temperatures, power, pump speeds, setpoints and so on) into a deep learning model and got better results than the existing approach.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s