Insights From Data Professionals: Challenges & Misconceptions

Dataiku Company, Scaling AI Lynn Heidmann

Data science is a team sport that needs to be collaborative to be successful, but data leaders and practitioners often disagree on where exactly responsibility for data quality and data science activation is housed. Without clearly defined and communicated data responsibilities, frustration (and low data ROI) is inevitable.

EGG New York graphic

In order to isolate common sources of friction and top challenges, at our EGG conference we surveyed more than 400 data scientists, engineers, analysts, and more. Explore the white paper that delves into what these data professionals had to say, complete with suggested further reading on common challenges and data miscommunications.

Get insights on top data challenges

Top Data Challenges

While data responsibility may be contentious, data practitioners and leaders agree on the top data problems. "Data cleaning and/or wrangling" was the top data challenge across all industries, more than 30% more pressing than the second top challenges, which was a tie between “connecting to data” and “deploying models into production.” This shows that the top data problems are not deciding which cutting-edge model to use or even how best to collaborate between teams and stakeholders, but instead, data teams are stuck spending their time focused on making their data manageable before all else.

five people sitting around a table and working on laptops

While data cleaning is an eminently solvable problem, when teams scale quickly it often gets brushed over in favor of attempting to develop shiny POCs, but without sturdy foundations, data teams are liable to lag behind when they have to clean (and reclean) their data on a regular basis. Coupled with the second problems facing data practitioners and leaders, it’s very challenging to get any of the potential value from your data if you cannot take the time to fine-tune and customize your models, but instead need to worry over accessing it in the first place!

Who We Learned From

We surveyed over one hundred data professionals across industries to gleam insights into the state of data collaboration and machine learning. And while they share similar key challenges in their data strategy, each industry has slightly different concerns. Read our free white paper to explore distinctions down to the industry level.

EGG 2019 survey infographic about the respondents

You May Also Like

Why You Should Be Using Apache Spark + Kubernetes To Process Data

Read More

Is Yann LeCun the New Marie Curie?

Read More

Dataiku and Kayrros: The Power of Alternative Data

Read More