A change of direction
After attempting to use regression modelling to predict the popularity of a song using multiple attributes, it became clear that this was not going to happen. Perhaps the diversity of different song compisitions and the subjective nature of people's taste in music means that their are just too many independent (and interdependent) variables to model.
At this point my thought is to change to some classification machine learning models. After doing a bit more analysis of the data in Tableau it seems that binning the popularity scores is probabaly the best way to go. Roughly 30% of the songs I have data on have a popularity rating of 0.
Using Pandas I quickly added a new column to my database to group songs into having a popularity score of 0, or above 0.
- Songs with popualarity >0: 44,846
- Songs with popularity = 0: 29,232