Understanding Confusion Matrix in a simplified way.
I think today your confusion about the confusion matrix would be solved. So when I have started learning Data Science I was also pretty much confused as you are now about confusion matrix.So will be breaking down this into different sections.
1.Why do we need confusion matrix ?
So for Example consider you have build a model using various algorithm and got some accuracy but now you are confused which model to use …? Can you completely rely on the accuracy only…?
2.What is confusion Matrix ?
Lets understand it by example “Predicting Churn for Bank Customers” you can download the CSV from the below Link
Link: https://www.kaggle.com/adammaus/predicting-churn-for-bank-customers
This is how our data set looks
Data set Screenshot |
Lets say you have trained your model using different algorithm on the training set.Now you will test your model on testing set so which scenarios can occur here:
- User left the bank and model also predicted that user will leave the bank.
- User didn’t left the bank and model predicted user will leave the bank.
- User didn’t left the bank and model predicted user will not leave the bank.
- User left the bank and model predicted user will not leave the bank.
If you understood above statements then we are almost done.Now we have to label the above statements:
Consider,
Model predicted user will leave the bank=Positive.And if it is correct True else False.
Model predicted user will not leave the Bank=Negative.And if it is correct True else False.
- User left the bank and model also predicted that user will leave the bank(“True Positive”)
- User didn’t left the bank and model predicted user will leave the bank(“False Positive”) also called as “Type 1 Error”
- User didn’t left the bank and model predicted user will not leave the bank.(“True Negative”)
- User left the bank and model predicted user will not leave the bank.(“False Negative”).also called as “Type 2 Error”
We have considered model predicted user will leave the bank as positive because in the above data set you can see Exited=1 means he left the bank.
Matrix format of the above statement:
Confusion Matrix(Model trained using SVM) |
In above figure Green boxes means correctly classified and Red boxes means wrongly classified.You can now compare the confusion matrix of your models and decide which one is better.
So hope this must have cleared your confusion about Confusion Matrix.