Machine learning versus Statistical Modeling Through a Practical Marketing Lens
Changes brought on by Machine Learning, AI, and Deep Learning have pushed the industry forward. However, statistical modeling remains an important feature of the market research process.
Recently, terms like Machine Learning, Deep Learning - and now, increasingly Artificial Intelligence - have permeated the marketing world, often talked about as more recent and hence better alternatives to statistical modeling. Artificial Intelligence especially, comes with an aura that somehow it can perform magically without human intervention. As a disclaimer, any blog on these topics needs to vastly simplify things as these are all massive topics, but let’s look at it from a practical marketing lens.
Techniques that probably would be classified as Machine Learning include Decision Trees, Random Forests, Neural networks and Deep Learning. Techniques that fall under traditional Statistical Modeling include linear regression, logistic regression, etc. There are a few differences between the two approaches that most (academics, data scientists, etc.) would agree on: Statistical modeling is based on the specification of an explicit model (e.g. a linear function, or a logistic function) along with some distributional assumptions that give the estimators some nice properties. In his classic paper Statistical Modeling: The Two Cultures, (2001) Leo Breiman states that in statistical modeling, we care what is in the box (e.g. a linear regression model). With Machine Learning methods, we do not have such an explicit pre-defined model structure. In this blog we will compare the two approaches through a practical marketing lens.
Most (not all) predictive analytics applications in marketing require (1) prediction accuracy, (2) understanding and credibility and (3) simulation (looking at what if scenarios).
Prediction accuracy is the degree to which a model can predict, ideally in new situations (e.g. time periods after the period on which the model was based). Understanding is the ability to interpret the model, preferably easily, in such a way that actionable recommendations can be made. Models also need to be viewed as credible: i.e. does the model seem intuitive. For example, the presence of strange effects can jeopardize credibility. Simulation refers to the ability to define new scenarios and have the model calculate the likely business result. These requirements are sometimes at odds with each other and Machine Learning and Statistical Modeling differ on these criteria.
How do statistical modeling and machine learning fare and compare on these criteria? Let’s look at each in turn.
Quite a few papers exist in which researchers have compared the predictive performance of various statistical models with various machine learning approaches. For example (and this is just one example), in a paper published in the Journal of Economic Perspectives “Machine Learning: An Applied Econometric Approach” the authors compare linear regression with Random Forest. The out-of-sample predictive accuracy for the Linear Regression model was about 41% and for the Random Forest it was about 45%. In and of itself not a huge difference, but in some applications these types of differences matter a lot. However, it is very difficult to find a method (within Statistical Modeling, or within Machine Learning) that consistently outperforms competing alternatives. Some authors have said it the nature of the data that will determine what model performs best, but this is not something that can always be determined prior to the start of the analysis.
Understanding and credibility
A difference between Machine Learning and Statistical Modeling is the explicitness of the underlying model. For example, in both linear and logistic regression the underlying model is straightforward and easy to interpret, and the results often make intuitive sense. In Machine Learning this benefit gets lost quickly. Decision trees, for example, can become unwieldy, with lots of nonsensical branching, becoming very hard to interpret. Often they include “effects” that make no intuitive sense and have no utility beyond that they seem to help with the prediction. Random Forest (a set of Decision Trees) or ensemble models (averaging across different types of models) are even worse. It is almost impossible to wrap one’s heads around the data generating mechanism and hence actionability becomes cumbersome. Even when the sole purpose is prediction, not understanding, such nonsensical “effects” can scare off marketers from using this type of model.
There are some situations where one might prefer Statistical Modeling over Machine Learning or vice versa:
Scenario 1. Business needs to or wants to understand the model
In marketing research, we often encounter what we refer to as driver models. For example, we want to predict overall satisfaction (or some sort of similar measure, i.e. likelihood to recommend, etc.). The study might have a set of potential independent variables, up to as many as twenty. These twenty variables, measuring different components of the product or the service, have been selected because the business expects these variables to have an impact on overall satisfaction (or likelihood to recommend etc). In addition, they might not primarily be interested in prediction, they may want to know which variables are most predictive and ultimately, what variables are causing satisfaction to go up - a question that cannot be fully answered with driver models. In this scenario Statistical Modeling might have an advantage over Machine Learning. We could still apply Machine Learning, but in this scenario Statistical Modeling may be a better way to go
Scenario 2. Business needs to predict and does not need to understand the model
There are business problems in which the business mostly cares about the prediction, not necessarily what the data generating mechanism is. In his paper, Breiman reports on a commercial consulting project he took part in. The data set included thousands of potential independent variables. The goal was to predict whether a compound contained Chlorine. His paper doesn’t reveal all the details of his modeling, but according to Breiman, Statistical Modeling failed, while Decision Trees (a Machine Learning approach) worked well. Of course, there is much more we can say about this topic.
After prediction accuracy and credibility, the third requirement for predictive analytics applications in marketing applications is simulation, i.e. the ability to run What If Scenarios. With Statistical Models, simulation is not a problem - we simply run the models with the different values of the independent variables and compare the results. With Machine Learning, we can sometimes run into challenges. For example, in Decision Trees the analysis is based on splits, e.g. at some level it may split based on satisfaction: respondents who scored an 8 and 9 versus those less than 8. This means that any simulation that involves an improvement in the 1-7 range will show zero impact. One could argue that is because the model shows that only a transition from 7 to 8 makes a difference. However, this is simply not realistic, and would for obvious reasons be a hard sell to marketing managers.
Quite a few papers have compared various statistical and machine learning approaches and there is not one method that consistently comes out on top. It may be that a given method’s performance depends more on the specific characteristics of the data on the specific features of the analysis method. From this author's point of view, Machine Learning and Statistical Modeling both have unique strengths and it is fine to use these approaches in combination. For example, in a situation in which you believe interaction effects might exist, it makes sense to first run a Decision Tree and then include any identified interaction effects into a regression model. Both approaches need to be applied with care and skill. This paper hasn’t addressed what makes Deep Learning unique which will be a topic for a future blog.
Breiman, L. (2001). Statistical Modeling: The Two Cultures. Statistical Science, Vol. 16, No. 3, pp. 199-215