What does the future look like for data and business analysts? From our perspective, it's exceedingly bright, because if you're an analyst today, you're in a better position than anyone else to become the Analyst of the Future.
We've identified three distinct future analyst roles, and today we're going to present the first one: the Data Explorer. These roles are taken from our recent guidebook, The Analyst of the Future, which you can download here.
How do you know if you might make a good Data Explorer? Well, you probably come from a background where you used some combination of Excel, Access, SQL, SAS, or Alteryx. You might code, but you might not – new graphical tools are allowing you to accomplish many tasks that once required coding.
Your new role will probably require a lot more creativity than your current role. You will need to be able to identify and connect to new data sources, merge and prepare the data, and build production-ready data pipelines. The purpose of the products you’ll be helping to build is for them to run in production, and so you’ll be obsessed with automation and reproducibility. You’ll be the local expert on the details of the data – when a new data source is added, you’ll know what fields it contains and which new features you might be able to engineer from it. You will also have your eyes open to new open data sources that you could use to enrich your internal data. And although a good portion of feature engineering will be done by the Data Modeler (whom we'll introduce to you shortly), you will be in charge engineering features like KPIs, which require your deep familiarity with the business implications of the data.
In some ways, the Data Explorer already exists, in the form of the role Data Engineer. Still, we think that Data Explorers could have a much broader set of responsibilities than those currently ascribed to Data Engineers.
You’ll still need to be familiar with machine learning algorithms, and you’ll probably need to have a firm grasp on data architecture concepts, such as distributed computation.
Here are some resources to help you on your adventure:
- You’ll still be doing a lot of work in Excel, so why not do it a lot better after watching Trello founder Joel Spolsky’s excellent and entertaining video, “You Suck at Excel”.
- This online course from the European Data Portal will introduce you to the basics of data cleaning.
- This series of blog posts on data science basics we wrote here at Dataiku is really helpful on explaining Hadoop.
- Should you learn to code? It’s not the worst idea. If you do, Python and R are the most useful for data analysts (and we tend to recommend Python over R, but both are “first class citizens,” as we like to say). Try this Udacity course on Python and this other one on machine learn