This is a guest blog post by Dataquest, an online learning platform that teaches data science skills, by helping students to work with real datasets.
There’s been a lot of hype around the data scientist role in recent years and it’s definitely true that there’s a lot to get excited about -- high-paying salaries, growing demand, the potential for interesting work, and did I mention high-paying salaries? This hype has led to a growing number of data science certificate programs and bootcamps aimed at preparing people for these positions.
However, while it may seem like a dream job on paper, I've found that there are still many misconceptions about the role itself. Here are some things that people don’t tell you about being a data scientist:
Data Science Isn’t as Glamorous as It Is Made Out to Be
When many people think of data science, they think of machine learning. In reality, though, data scientists spend a lot of time just gathering, cleaning, and preparing data, leading some to jokingly refer to their job as “data janitor” instead of data scientist. By some estimates, data cleaning and preparation can take up to 80% of a data scientist’s time, so if you expect to become a data scientist and only work on machine learning tasks, you’re in for a rude awakening.
It's Marketed as Conducive to Craftsmanship, but It's Often Quick and Dirty
Because most data science that is done in practice exists to make money, things often need to happen very quickly. You’ll often have to produce answers or solutions by a specific deadline set by the business and be expected to work within the boundaries of frameworks like Agile. Many times, this entails doing far from your best work, even if you have a very clear idea of how to do much better.
Data Science Can Be More of an Art Than a Science
Data science actually involves quite a bit of abstract, creative thinking and depends heavily on business knowledge. Many non-practitioners don't realize that working with data means making choices based on sometimes subjective terms. For example, suppose you need to handle missing data in a dataset. Is it best to drop rows, label the values as missing, or use any of a number of different imputation techniques? Sometimes this problem requires technical expertise, but really the first thing to do is ask why the data is missing and if you don't know, check with domain experts to guide your decision making. The "right" way to handle depends on the context.
It Requires Strong Communication and Collaboration Skills
When you think of a data science project, you may picture a lone data scientist working independently to solve a problem. In reality, data scientists work in teams and must collaborate with both technical and non-technical stakeholders. Since domain knowledge is so important, knowing how to communicate effectively with sometimes non-technical domain experts is crucial. This means gaining an understanding of the business applications before and during a project and then knowing how to communicate the results in a way that your particular audience can understand. The unfortunate reality is that even if you’ve produced something really valuable and/or innovative, your work will be meaningless unless you can communicate it effectively to others.
The Data Science Field is Ill-Defined and Constantly Changing
The data scientist position is still very ill-defined, so the work a data scientist does at one company might be very different from the work at a different company. Some companies may also use certain job titles like "data analyst" and "data scientist" interchangeably so that the definition becomes even more nebulous. On top of that, because data science is still a relatively new field, it’s constantly changing. For practicing data scientists, this is both a blessing and a curse. There are many opportunities for growth, but they’re also aren’t many opportunities for downtime to keep up with all the changes. Ultimately, the field is too large for any one person to master and keep up with their day job.
Conclusion
Whether it be time constraints, lack of resources, messy data, or uninformed stakeholders, data science can definitely have its frustrating moments. At the end of the day, the position has both its perks and its frustrations just like any other job. However, if you have clear expectations of the role and are prepared to handle these frustrating aspects, then it really can be one of the most satisfying jobs.
About the Author
Julie Chipko is the Python Team Lead at Dataquest and is addicted to all things data. She enjoys masquerading as a New Yorker and eating massive amounts of dark chocolate.