The data science community is in a period of rapid change, and the upcoming documentary Data Science Pioneers - made by and for data scientists - will be the first to capture both this excitement and uncertainty. In anticipation of the data science and data analytics documentary's highly anticipated release, we sat down with Florian Douetteau, CEO of Dataiku, to discuss how the role of data scientist has changed and how to maintain a supportive and open community grounded in understanding, collaboration, and shared knowledge.
The Evolving Role of Data Scientists
Claire: In the course of your career, how do you think the role of data scientist has evolved?
Florian: So, I think that in the last ten years data scientists have evolved from being something that was a curiosity for most people, like "What is a data scientist?" to something that is more structured, like "What kind of data scientist are you?" And I guess it is related to an evolution of the way you work on data projects, which is less and less as individuals and more and more as a team.
Claire: Now you've commissioned this documentary that highlights data scientists in all sorts of teams and industries and companies, from the very large to the very small. Are there any sort of traits or characteristics of their work that you think unifies the field?
Florian: I think it is very fragmented in terms of types of data scientists, and the reasons why one gets into data science. Data science is a fairly diverse job profile. I think that something that unifies the data scientists right now is that they have some kind of analytics mindset, they also think about data science in terms of the relevance, meaning how will this data or model actually be used? What do I need to do in order to get an impact? And what kind of impact will there be from data science? I think this question of impact and governance in data science is on the mind of every data scientist, and in a way, that's unifying them right now.
On Data Governance and Ethics
Claire: The documentary does talk a fair bit about ethical governance issues facing data scientist, but a lot of them, even if this is something exciting, they don't necessarily have this sort of training. Do you have any sort of recommendations or advice for people who are trying to be very aware of governance issues?
Florian: Getting the training in data science governance is not something you get at school, usually because it's not about ethics proper, or philosophy, it's about something that is new and will be actually applied in real life. You can't really get trained for it at school, but you can certainly read about it.
Most data scientists went into data science because they were curious, and wanted to learn. I guess that's just what you still need to do. It's nothing too far from learning new technical skills, or the latest framework. I think by getting curious, and reading The New York Times and The Economist on these issues and talking with peers about them, the conversation will move forward.
Claire: Does this documentary fit in with the sort of resources that can help people learn these things?
Florian: Data scientists lack perspective about the job of other data scientists that wouldn't be just a technical presentation or another Meetup. We wanted to explore how data scientists perceive their work, their impact, and the evolution of their careers. That is why we supported this documentary, to help data scientists build their own perspective about the job of being a data scientist. Both as a new role and a role with impact on society.
Building an Open and Inclusive Data Science Community
Claire: You mentioned meet-ups, do you think there is a community of data scientists?
Florian: I think there is. There's a data scientist community that is around technology and some best practices, and what are the latest trends and so on. There is a start of a community about these impact topics. I guess the question of the data scientist community is how does it integrate into the broader data or analytics community, which involves everyone who is not today a data scientist but would like to be a data scientist or work on data science in their field. For me, the question of the data science community is not its core community of actual data scientists, but is it the right community?
Claire: Do you think there are barriers to entry to this community, or do you think it's pretty welcoming?
Florian: I do think those are fairly open spaces, meaning that if you can navigate through this community, which is both online and offline, depending on your level of expertise, where you are on your journey, you can find resources from training to competition lead-ups, getting to something very hands-on, things more high level to get a new perspective. I think this does exist if you really want to be a part of it. I think the path for data scientists is less clear. In five years what do I become? For lots of data scientists, it's something new, just because the job is new, so this part of the community is less structured compared to other work communities.
Claire: It sounds like a lot of this is everyone is learning as they are going along. Do you have any ideas on how people already in the industry might be able to strengthen this community?
Florian: Technologists, on the whole, don’t care about competition that much; they focus on the next challenge and how they can overcome it together. The data science ecosystem is dominated by open source, collaborative tools, and libraries that enable all sorts of developers to contribute to solving these problems.
But as the field matures, there’s this risk that is will become siloed around big businesses and that the spirit of innovation will be superseded by a preoccupation with profit. We want to ensure that at the end of the day, we’re still tag-teaming these pressing issues like data ethics and the latest machine learning methods because data science impacts everyone, not just people buying and selling the tech. The data science community is still quite open and accessible, online and offline. That’s why we supported this documentary: because we want to maintain the camaraderie, collaboration, and democratization that data science is fueled by.