In a previous post, we introduced the role of the Data Connector, a future analyst role that is focused on connecting to data, preparing that data, and making sure that insights are delivered as quickly as possible. Today, we'll talk about a second role: the Data Modeler.
These roles are taken from our recent guidebook, The Analyst of the Future, which you can download here.
Do you have what it takes to be a Data Modeler?
You probably already have training or experience as a statistician, a computer scientist, a financial analyst, a mathematician, or a member of an analytics department. You might code in Python or R, but you might not – the barriers to entry for becoming a Data Modeler are decreasing because of new software that allows you to build machine learning models with graphical interfaces. Still, you will need an advanced understanding of statistics and data science concepts and methodology in order to become a Data Modeler.
You will be in charge of building predictive models and generating either a product or a service from those models, and then implementing them. You will create checks and metrics for monitoring these models, because there will be a huge amount of them in production! You will be a master of machine learning models and the frameworks used to validate their quality.
You will apply your creativity in feature engineering: using abstract mathematical techniques to select and combine the right variables and use them in the right model. This will often require you to reduce the number of variables from an enormous number down to something more manageable.
In short, you will be the go-to person on your team for all things math, stats, and algorithms – and also for knowing how to use different types of data in the many models available to you at your fingertips.
Here are some resources that should aid you in your quest:
- Andrew Ng’s Coursera/Stanford course on machine learning is basically a requirement!
- Anand Rajaraman and Jeffrey Ullman’s book (or PDF), Mining of Massive Datasets, for some advanced but very practical use cases and algorithms
- A more theoretical, but clear and comprehensive, textbook: The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman
- Oxford professor Nando de Freitas’s 16-episode deep learning lecture series on YouTube
- Open-source machine learning libraries, such as scikit-learn (and their great user guide), Keras, TensorFlow, and MLlib
- Python Machine Learning: a practical guide around scikit-learn; “This was my first machine learning book, and I owe a lot to it” says one of our senior data scientists
- Try your first Kaggle data science challenge!
Discover the other future analyst roles in our guidebook, The Analyst of the Future.