Handcrafted model baseline

My current model for dataset-1 is handcrafted, but calibrated on the data. I won’t say too much about this method, except that it is very opinionated. In the Bayesian spirit, I bring domain knowledge into the mix and let the data update my priors. Every part of the model can be visualized and explained.

Because it incorporates domain knowledge, my model has an advantage over machine learning models. However, machine learning models have the advantage of automatically discovering structure and interactions that I did not consider. One purpose of this blog is for me to learn whether machine learning can improve on my model, or if it just starts to over-fit the data.

As a baseline for comparison, here is the FVR plot for my dataset-1 model. I used only 5 of the 31 features in this model. The red-dashed line is a visual guide for the ideal model (slope=1):

On out-of-sample data, my strongly positive predictions did pretty well, but my negative predictions were way off: