I spent my past summer at the Greatest Good, a consulting boutique headed by Steve Levitt and other Chicago economists. I got to work with transaction level data for various companies. It was the first time I had properly played with large datasets and it set me on course to pursue a career in data science.
Being able to squeeze meaning from data is not simply a science, but also a craft. My summer gave me a first glimpse at what may be useful guiding principles and I wanted to share them with you:
Always browse your raw data to see what’s actually there, you might be surprised. Scatter plot your data and residuals. Use histogram or box-and-whiskers charts to visualize distributions. Summary stats never tell the whole story.
Aggregate data to the right level. Abstract too much and you lose information. Look too closely and you lose the bigger picture.
First build a simple model, then dig further. This makes your time better spent.
Think really hard about your assumptions. Never forget them. Test your them if possible. Make sure your assumptions don’t weaken your analysis.
Have a story to tell. Don’t just recite numbers. This forces you to think harder about what’s actually going on and it allows you to communicate better