In the last post, we briefly explored the bank marketing data set from UCI Machine learning repository:
https://archive.ics.uci.edu/ml/datasets/Bank+Marketing
In this post, lets explore some simple classification techniques.





So, this was my attempt with the bank marketing data set. Hope you are able to take away something positive from it today.
Please share your thoughts or critique...have discussions if you will. Just don't forget to leave a comment. :)
https://archive.ics.uci.edu/ml/datasets/Bank+Marketing
In this post, lets explore some simple classification techniques.
Naive Rule:
As we mentioned in the previous post, the response rate in this data set is 11.7%. The naive rule here would be to classify all customers as non-subscribers as 88.3% customers in the training set were non-subscribers. Obviously, we are not going to use this for modeling but we may use this later for evaluating the performance of other classifiers.My take on performance consideration:
Before we move to building more models, I want to talk a little bit more about the issue of unequal classes. Here we are dealing with asymmetric classes i.e. it is more important to predict correctly a potential subscriber than a non-subscriber. Why? Because the cost of making a call is likely much lower than misclassifying a potential subscriber as non-subscriber. In other words, we want high sensitivity and low false negative ratio (high negative pred value).Naive Bayes:
Naive Bayes uses Bayes theorem to compute the (conditional) probability that a record belongs to an output class given a set of predictor variables.
Decision Tree:
We used CART with a cost matrix. We assumed a FN/FP cost ratio of 10:1 in the final model though we also tried other ratios randomly. Anyway, the tree produced with using a cost matrix was clearly much better than without one as we suspected.Performance Results:
Gains Chart:
Using CART decision tree model, we got lifts of 2.9 and 2.3 in the first and second deciles respectively. The lift curve falls apart beyond the 5th decile though, clearly there's a lot of room for improvement.So, this was my attempt with the bank marketing data set. Hope you are able to take away something positive from it today.
Please share your thoughts or critique...have discussions if you will. Just don't forget to leave a comment. :)
No comments:
Post a Comment