Machine learning model speaks if you listen: Explainability

Tushar Tiwari
Nerd For Tech
Published in
7 min readJun 30, 2021

--

Automation is on the rise in the last decade, but even today the final call to make the decision relies on humans. There is no doubt that we are making data-driven and well-informed decisions.

Almost all of us are using Machine learning and AI products knowingly or unknowingly (Examples: YouTube recommendation about your wish-list holiday destinations🏝 🏕, Home Loan application screened by a Bank 🏦)

Problem with Black box models

Now let’s assume that you are a branch manager at a bank and you receive a large number of home loan applicants every week. for your help the bank’s technical team has given you a model.

It’s a black-box model, just takes the input and gives the decision to approve or reject the loan application.

Bank scenario

The model has good accuracy and good for the business, but there are two concerning questions here

  • Why is a particular loan application rejected? It is required to inform the customers.
  • Why is the loan approved? It is required to justify the credibility of the applicant.

Explainability comes to the rescue

For inherently interpretable models such as logistic regression, global feature importance is known using the weights of the features. But for a single data point(x_q) we need to assume that x_q follows a global structure. In the real world, we require more complex models that like Deep Neural Net. Feature importance calculation is not possible for these models.

SHAP — which stands for SHapley Additive exPlanations. It is one of the most widely used methods for explainability and answers to “ “Why” question” a prediction is made.

SHAP has its roots in cooperative game theory, where each feature is considered as a “player” and the performance of the model is the “payout”.

SHAP is model agnostic: It independent of the underlying model used for making the prediction. It becomes quite valuable for the model that is non inherently interpretable( Like Deep Neural Networks).

A useful mapping to keep in mind

  • Players correspond to features.
  • Payout corresponds to the final model performance.
  • The game corresponds to a single data point(A single loan application).

Breakdown of what SHapley Additive means

  1. SHapley value: of a feature is its contribution to the payout,weighted and summed over possible feature combinations.
Source: interpretable-ml-book/shapleyl

Later in blog, I will show how to calculate SHAP value for a feature.

2. For a given observation x₀ , the sum SHAP values of all the features will result in the difference between the prediction of x₀ and average prediction.

Intuitive explanation by example

Let’s assume there are only 3 factors (age, salary, and credit history ) needed for predicting the loan application.

Features for loan applicants

For an applicant of 27 years of age and an annual income of INR 6,00,000 having no default credit history. Our model predicts that the bank should approve his loan with a probability of 68 %. On average the bank approves 40% of loan applicants.

How much each feature has contributed to prediction compared to the average prediction?

Features age, salary, and credit history have worked cooperatively to make the prediction. we need to figure out the contributions by each feature towards a difference of predicted and average value.

Here it is 68% — 40% = 28%.

A possible explanation could be as follows:
The young age of 27 years has added a 20% and no default history has added 25%.On the other side average salary has led to a reduction of (-17%). Finally a total of (20+25–17= 28%).

For 3 features can form a coalition in 2^{3} = 8 ways.

Calculation of Shapely values

Each node has two rows:

  • 1st row: Features involved.
  • 2nd row: Prediction of the probability of approving the loan.

What is the Marginal Contribution(MC) of a feature?

At the root node, the model with no nodes will simply predict the average probability of loan approval (40%)among all the data points. If we move to the node with only credit history as the only feature the prediction for x₀ will be 65%.

Therefore the Marginal Contribution(MC) of credit history for x₀ is 25%.

For finding out the SHAP value of credit history on the overall model we are required to find the marginal value of each node where credit history is present.

SHAP value for credit.
Focus on the path where credit history is a feature.

What are the weights of the Marginal Contribution(MC)?

  • All the weights corresponding to the Marginal contribution of the f -feature should be the same. Eg: let the weight of MC of age given {age, salary } is W1 and the weight of MC of salary given {age, salary } is W2, then W1 = W2.
  • The weight MC of the f-feature model is equal to the reciprocal of the possible number of MC of the f-feature model.

While calculating the shapely values we replace the feature values which are not present in the coalition with the values for that feature present in our dataset. For example, if age is missing we try out the different age values present in our dataset combined with other features present.

Note: The computation time increases exponentially as the number of features increases.

Finding the exact shapely values for a problem is an NP-hard problem. In the real world, for most practical purposes an approximation does the job well.

Explanations made using SHAP always consider all the features, it does not offer selective feature explanations.

Pros:

  1. It satisfies all the properties of a fair payout: Efficiency, Symmetry, Dummy, and Additivity. The only method to have a solid theory base.
  2. Ability to have explanations not just between the average of the whole dataset, but also with a subset of points and even with a single data point.

Cons:

  1. It takes a lot of time to compute the exact shapely values and it increases with the increase in the number of features.
  2. Requires the access of the data to calculate shapely values of new points.
  3. There is no prediction model, it just returns Shapley values per feature. We cannot form statements like” Change in input by x points will result in a change prediction by y points.”

The code example in Python

Let’s install Shap library using the pip.

!pip install shap

Importing the required library

import shapimport sklearn

Let’s begin with the heart of SHAP libray.

explainer = shap.Explainer(model)
explainer
#output
<shap.explainers._tree.Tree at 0x7f33ff1e3f10>

The explainer class is the primary interface for the SHAP library. While creating the explainer object if no algorithm is specified then it set to “auto” which means it selects the best explainer according to the model used. There are different explainers available like Permutation, Additive, Exact, Tree, etc. SHAP values are calculated based on different methods used.

Here model parameter could be any type of model a regressor/Classifier and it supports Sklearn,Xgboost, LightGBM, CatBoost, Pyspark,tensor-flow models.

shap.summary_plot(shap_values,X_train.columns)
Output of shap.summary_plot()

Features are ordered by the level of importance in the classification task.We can see the petal length is the most important feature here globally.

shap.initjs() # this used to for javascript display if using on Jupyter notebook or colabshap_values = explainer.shap_values(X_test)shap.force_plot(explainer.expected_value[0], shap_values[0], X_test)
shap.force_plot() Output
shap.initjs()shap_values = explainer.shap_values(X_test.iloc[0,:])shap.force_plot(explainer.expected_value[0], shap_values[0], X_test.iloc[0,:])
plot for a single data point

Here 0.1322 is the base value and the feature petal length has pulled the value to the lower end. The final prediction is -2.33.
Note that it is the influence of petal length feature only for a single data point.

Colab Notebook for code used.

For curious minds

If you want to have a completely different point of view and have 10 mins of time, then do watch ‘Please Stop Doing “Explainable” ML — Cynthia Rudin’

--

--

Tushar Tiwari
Nerd For Tech

Finding insights from data | Fascinated by how Markets works.