Methodology & Functionality in Differing Data Science Roles

Scaling AI Joy Looney

In this Banana Data Podcast episode "Methodology & Functionality in Differing Data Science Roles," our hosts share the rundown on data science roles, so you will no longer be in the dark for behind the scenes data science happenings. Speaking with a Dataiku solutions engineer, listeners get special insights on how the roles of data science and engineering teams contribute at a broader organizational level. 

 

 

Prefer your favorite tunes over podcasts? That's fine. Here's a full transcript of the podcast, so you won't miss out! 

Read the Episode 

Corey: Hello everyone and welcome to the fantastic fourth episode of the Banana Data podcast. I'm your cohost Corey Strausman. I am a community manager at Dataiku. This season, we're focusing on humanizing data and AI. So, of course, it's natural that I would like to introduce our co-host, the superhuman CPM. 

CPM: Giving me way too much credit there, but I guess we're the fantastic four, we're doing a good job. 

Corey: This is eventually going to become a movie podcast, folks. So, if you're a subscriber just hold on. We'll eventually get there, get through this data science stuff first, but, you know why we're all here. So, CPM, last week's episode with Matt was really awesome don't you think?

CPM: Yeah. Last week we talked with Matt Dorros of Wayfair about how data scientists and citizen data scientists collaborate together to deliver value to a wide range of audience members, whether they're technical or non-technical. And today, we also have another fantastic special guest host, Emma Irwin, a solutions engineer here at Dataiku, who similarly works with audiences along that wide range of data experience. So Emma, why don't you introduce yourself? 

Emma: Yes, I would love to. Hello everyone. Like mentioned, my name is Emma Irwin. I'm located in Toronto, Canada. So, we are officially an international podcast here, and I'm a solution engineer at Dataiku. 

What is a solution engineer? You may have no idea. I know that when I was a little girl growing up, I definitely didn't put it beside princess or fireman, for example. But it's a really important role here. Because we are a software company, obviously, we are trying to explain how great the software is to other people, specifically, our prospective clients who have a range of understanding and skill sets of their own.

So, it's comprised of three main pillars, that'd be sales, obviously, data science here at Dataiku, then understanding of hardware and infrastructure systems. At the end of the day, I'd say it's really about working with our potential clients to build their interest, their expertise, and some insightful projects that encourage them to purchase Dataiku and then work with us as a partner moving forward. So, it's a lot of fun! 

CPM: So, Emma, it sounds like there are so many different things involved with this role. How do you manage all of that? 

Emma: It is definitely a balancing act. Luckily, we have an amazing team to work with, resources I can always pull in, and it's an exciting puzzle to figure out the perfect combination of individual users’ needs and their business's end goals.

Corey: Emma's kind of in a perfect situation because she gets to balance out the technical side with the storytelling side. We focused on that in an earlier episode, we're talking about how we're trying to humanize data and trying to democratize it and make it more available for wider audiences.

So Emma's not only focusing on being able to tell a story. She's also able to show off the technical capabilities that data platforms and data science offers not only to the enterprise but also to specific individuals. 

Emma: Yeah, and it's a ton of fun. Honestly, I think the biggest win for me is when we go into a new customer's account and they just recognize how many skilled individuals they have in their own organization that they had no access to before. Obviously democratization is a bit of a buzzword, but it is really critical for bringing out the best in everyone. 

Corey: We love wins here at the Banana Data podcast, so we want our audience to win by making sure that you subscribe to the Banana Data podcast. It's on Apple. It's on Spotify. It's wherever you listen to podcasts!

So, we talked about how fun and how rewarding it is to be able to get those wins, mix those pitches, explain, tell a story, and show off the technical capabilities, but let's pivot to the strategy. Let's focus on how we're able to achieve that and the strategic focus behind it. 

KITCHEN.00_02_03_08.Still008-1

CPM: Generally speaking, there's two different approaches that a company can have with their data strategy, and that is often deemed as either defensive or offensive. I think this can fall into a bunch of different buckets, whether it's methodology,  skill set, or business objectives. There's not necessarily one preferred method forward, but I think, similar to Emma's role, it's about finding that balance between the two.

Corey: So, in other words, when Emma is choosing a strategy, it's more or less like we're in a Raptors game or something like that. And you have to choose whether or not we want to try to be more defensive minded. We want to put up those buckets. You want to get those rebounds. You want to box out, you know, we want it to be in a zone.

CPM: So we went from a movie podcast to a data podcast to now a sports podcast? 

Corey: You know, whatever our fans want, if they subscribe, we're going to give them what they came here for. 

CPM: Well, we love to please, that's for sure. Well, Emma, why don't you tell us a little bit about the difference between a defensive and an offensive strategy when it comes to data science. 

Emma: Yeah, absolutely. So, it definitely depends on the company and what's going to be best for them. I don't think I would be able to sit here and say that there's one specific way forward, but a key difference between the two is your end objective. If you're in a defensive position, likely you're going to be considering more frequently things like data security or data privacy, having regulatory compliance and governance. All critical capabilities, but not necessarily the sexy thing you normally think of when you consider what big data and AI is possible or makes possible.

That's really where an offensive strategy comes in — improving your competitive position or, you know, the big moneymaker, profitability. I think that's really where you have to make a choice because both can be beneficial, but it depends on your end goal. 

CPM: I think that's a really fantastic point. Think about companies that deal with security data science. One end of the spectrum (more defensive) could be preventing fraud in the first place and being able to identify the different features that correlate with fraud and being able to identify them early and take action to prevent them. That's definitely more defensive. But, on the other end of the spectrum, optimizing rewards programs for people who hold credit cards, different types of credit cards, that's more of an offensive strategy towards optimizing the revenue of the company, profitability, or their customer satisfaction and retention. I think both of those are important for a company that is working with that type of sensitive data, people's money, so on and so forth. I definitely wouldn't claim that one is more important than the other. It sort of depends upon the business objective at the time. 

Emma: And I think sometimes you'd be surprised, too, when you go into a project that is meant to increase profitability, it doesn't always end up bringing in more money than you would have saved with a defensive method. So sometimes it requires a balance again, please no one take a shot at how many times as I say balance throughout this but it is key, so... 

CPM: You know, we'll have to find a balance between different vocabulary words too, so it's a learning process here. 

Corey: I'm going to balance this out by asking you a question — so, whether it's looking at the offensive or defensive. For example, with offensive, if you're analyzing for the future, talk about the personas that you're working with or working towards, because if you're meeting with a company you're not just pitching a building; you're not pitching a boardroom; you're pitching individuals that are going to have different backgrounds, different needs, different experiences. Based on their technical know-how, how do you know when to focus more on the technical aspects, as opposed to the storytelling aspects of “this is what it can produce” as opposed to “this is how something functions?”

Emma: I think this is one of the key challenges in a role like mine because there is sort of a tipping point where individuals who do day-to-day tasks care more about how we're going to enable them to do those tasks rather than the end goal of all of the tasks put together. And, I think that's totally natural, right? If I gave you something that made your everyday more fun, easier, or let you off a little bit earlier from work, you would probably care a lot more than if I said, “Your company is going to get X number of dollars.” Whereas, people in higher positions will have more of a big picture type of view or be incentivized to get those larger initiatives pushed through.

So, everyone loves a good story, but some people care more about the individual words on the page. 

CPM: Well, I think that definitely speaks volumes. And, it's sort of related to the different types of stakeholders involved. We talked about, last week, subject matter experts versus data scientists and those being two different types of stakeholders that add value to a project here. In this case, whether it's being a solutions engineer for a product and illustrating to data analysts how this can help them analyze the past and have more of a defensive strategy in terms of their methodology, or an offensive strategy for the data scientists who are predicting the future. You know, there's a sort of consolidated location where they can all get that job done. It doesn't matter whether they're focusing on that defensive or offensive strategy, because they're both important to the business and, you know, eventually reporting those insights up to those stakeholders.

Corey: Well, CPM, Emma, you know, we happen to be joined today by a data scientist and an engineer. So we've been talking about methodology, but let's talk about function. Do you guys consider yourself in your roles, and your day-to-day, and your overall roles, a generalist or a specialist? Is a data scientist, more of a specialist or a generalist? Is a solution engineer more of a generalist or a specialist? Am I really just sort of confusing the two? Like how would you guys see yourself performing in those roles?

CPM: I think that's a great question. And, you know what, it honestly depends. I think data scientists could be generalists or specialists depending on the role cause there are benefits and detractions to each, right? A data scientist who's a generalist might be able to offer a lot of value across many different realms or many different industries, but that might only go so far. So, maybe they're able to make some humanization efforts with interpreting their data, but it definitely won't be as deep as someone who has that subject matter expertise (like we talked about last week.) 

But, for someone who's a specialist, maybe you're a data scientist who works in the finance industry, and you know so much about the financial market, maybe you studied finance in college as well, you're definitely going to be able to take it 10 steps further than a generalist. However, you might be limited to just that industry. There's benefits and detractions to both and it sort of just depends what type of data scientist you want to be.

Emma: And, I think to that point, a lot of the time, it's about nurturing skills that you think will either best serve yourself or your company. I know that I came to Dataiku with a lot more infrastructure knowledge than I did data science. So in my current role, I had to lean on the data scientists for their specific knowledge and specialization.

And as my skills have increased, I've been able to handle more of the generalized tasks and bring them in when we have a client who's asking for something either in a certain industry or with a certain use case that would really put their skills to the best use. Which has been cool to see, because when you really let someone with the expertise lose on a project, then you get to see them really flourish.

CPM: So Emma, the next time my Spark or Kubernetes breaks, I know who to blame, right? Where to come  for help? 

Emma: I like the second option. 

Corey: Yeah. I mean, you guys, talk about sort of the collaboration, you know, it sounds like someone might have certain skill sets that compliment someone else, regardless of the formal position. Would you agree that it's more about background and skill set rather than whatever your formal title is and the expectation? 

Emma: I think that's a really good way to look at it, and I think it's about being cognizant of your skills as well, knowing the limitations and potentially knowing your own biases. Because if you come from a certain industry and there are a set of expectations that are either about the data or about the end user of the data or the model that you build, then you might be incorrectly addressing a new area if you do choose to move. So there are going to be balancing acts regardless of the title. 

CPM: Yeah, and maybe another spin on this is that when you expect someone, whether you're a data scientist, data engineer, data analyst — doesn’t matter — to take on a lot of different projects or tasks that have to do with the data space,  they have to be motivated to be able to gain that knowledge that they don't have, but also fine-tune the skills that they currently have. I think part of that is on the onus of the data engineering, or science team, or the manager to be able to make space and identify those relevant challenges to help those individuals and teammates grow in those specialized ways. But on the flip side, it is partly on the individual to take initiative and seek to stretch themselves a little bit further.

Again, as always, there's a balancing act to be had there for doing too much versus not enough. But, you know, I think that's an essential skill set that no matter who you are, you know, in the tech industry, that's a skill set you should hone.

Corey: It sounds like when it comes to being able to help people learn how to use a product or applied data science that collaboration, again, is key.

You know, we don't want engineers siloed and away in their own little sphere, and we don't want data scientists acting independently or not collaborating with these other important players. It's just an important part because, ultimately, to use another Hollywood reference, we don't want a Jets and Sharks thing.

This isn't “West Side Story.” So we don't want engineers going up against the data scientists. We want them to be able to work together to be able to find  a solution or best practice, and generally just make people happy. 

CPM: You know what that reminds me of? One of my favorite shows growing up, which was “So You Think You Can Dance.” What they used to do is pair together two dancers that had completely opposite styles and that actually molded such fantastic chemistry between these two individuals, because they were able to bring something completely different to the table. That 100% applies here. 

Whether the data scientist is a subject matter expert or a solutions engineer, it absolutely applies — no matter who you are. You have a completely different set of experiences, backgrounds, and skill sets, whether you're a specialist or a generalist, whether you're working towards, looking towards the past, or analyzing the future, whether you're focusing on a defensive or offensive strategy — it doesn't matter. All of that applies and is important, and the only way you tap into those is by virtue of that collaborative effort.

KITCHEN.00_03_15_39.Still013

Corey: In my day-to-day role here, I don't really get to deal with customers. So I want to live vicariously through you guys since you're dealing with potential customers, users, you know, important power players within these organizations all the time. So, based on the methodology and function, without revealing too many details, can you guys think of an example, off the top of your head, where you can talk about the offensive versus the defensive — the collaboration and kind of playing those important roles to be able to ensure that you're able to democratize AI? 

Emma: Yeah, definitely. I can talk about one of our existing customers who has just actually increased the number of users on the platform because they had found they really had two different populations.They had a team of data scientists, who were already using Dataiku for what we would consider to be offensive projects. They were hired for the sort of money making and exciting future-looking work. But, they were getting bogged down with this ticket system that their analysts and others were able to submit requests for assistance in and, over the course of about six months, they had 280 tickets for day-to-day — what we would consider defensive —  tasks that needed to get done or these analysts couldn't move forward.

And so, what they decided is that it would actually be way more helpful to have everyone working together and being able to build out a full kind of data story, if you will — a whole pipeline, where the defensive tasks are getting completed, the analysts are unstuck and unblocked from their own kind of subject matter expertise in that area of the data, with the help of Dataiku, obviously, and then, the data scientists who were working on some of the new projects could devote more of their time to them. So, it was a little bit about upskilling. It was a little bit about access to opportunity and to data and expertise and then a lot about collaboration. So Corey, even though you don't see the customers, you clearly  know them.

CPM: I love that story. It's sort of alleviating some of the gap-filling that the analysts may have been doing to allow them to focus on the more important endeavors. I'm recalling a project that I worked on a long time ago with some architects, way back at the start of my career, but I think the analogy applies here. They were mixing a lot of different types of compounds in their concrete to build the foundation of an actual tower of sorts and they didn't know which one was going to be most optimal. And here it was — the blend of their subject matter expertise in architecture and my knowledge of experimental design that really helped in that collaborative effort. 

It was obviously something very important because, you know, you don't want to start to build a tower and then find that when it's halfway done, things are cracking at the bottom and then, you know, things are going to fall through and the project will fail. Clearly, I don't know anything about architecture, but I can certainly analyze the data and infer how that would scale with some predictive modeling and also be able to inform what the type one errors are versus the type two error risk and things like that. The architects don't know what that means necessarily, but we're able to pair that information together to ensure the success of the project ultimately. 

Corey: Well thanks for both of those examples. Now I feel sufficiently educated — like I could just go out there and start pitching this stuff. So I want to thank you both for participating. Emma, you killed it! She was a little nervous coming on today, but I bet she did a great job. Emma, since you've made this podcast international, your queen would be proud. I want to thank you for being a friend of the pod. CPM, do you want to tell us a little bit about our next podcast? 

CPM: Yeah. So join us in two weeks where we'll be talking to Jeremie Harris of Towards Data Science and discussing the emerging challenges in AI and data science. We hope to see you there! 

Corey: Thank you, everyone and remember to subscribe to the Banana Data podcast, wherever you listen to podcasts.

Emma: Bye team.

You May Also Like

🎉 2024’s Superlative Awards: 7 Dataiku Features That Stole the Show

Read More

The Dataiku GenAI Features Revolutionizing Enterprise AI

Read More

5 New Dataiku Features to Streamline Your RAG Pipelines

Read More

Dataiku Is a Gartner Peer Insights Customers’ Choice

Read More