4 Do's and Don'ts of Hiring and Upskilling for AI Talent

Dataiku Product, Scaling AI, Featured Lynn Heidmann

Why is attracting the right talent so critical to organizations? A recent study of more than 600,000 researchers, entertainers, politicians, and athletes found that high performers are 400% more productive than average ones. Similar studies in business not only show similar results but also reveal that the gap rises with a job’s complexity. 

That means in highly complex occupations — including the information- and interaction-intensive work of data scientists, data analysts, software developers, as well as data executives, data team managers, and the like — high performers are an astounding 800% more productive.

Data science, machine learning, and AI are becoming table stakes for businesses today, with more and more companies realizing the need to leverage them or else risk falling behind the competition. It’s become increasingly clear that the path to success lies in Everyday AI, or when AI is embedded throughout a company's operations.

However, successfully enabling Everyday AI is often easier said than done; setting up a healthy center-of-excellence model with a centralized team of data experts that effectively collaborate with, educate, and enable subject matter experts on the business side is an undertaking that requires a combination of good hiring for key roles plus effective upskilling of existing staff. 

Top challenges companies face when starting on (or ramping up) their AI maturity include:

  1. A lack of specificity around the business’ AI needs. Insufficient planning or assessment of what the company’s data needs actually are before staffing up can result in mis-hiring, which in turn causes imbalances that lead to inefficiencies or turnover. 
  2. Difficulty hiring data talent. Top data and tech talent is more in-demand than ever, and one out of three businesses cite limited skills and talent as a top AI challenge. Other difficulties stem not just from finding enough qualified candidates, but educating existing staff on effective interview techniques to properly evaluate whether data talent will really bring the right skills to the table. 
  3. A lack of formal upskilling programs. If there’s one important takeaway from this article, it’s that upskilling is fundamentally critical to AI staffing, and it will likely be the most important source of valuable data talent at the organization. Yet it is overwhelmingly and woefully overlooked at most companies. Crafting formal active continuous learning on AI into employee education programs allows organizations to quickly access high-performing talent and shape the talent into the emerging needs associated with scaling AI. 

Now that you definitely understand the importance of harnessing good talent and you know the top challenges associated with doing so, how should you and your organization effectively approach hiring and upskilling for AI in practice?

Do Know What You're Looking For

Organizations are increasingly realizing and acknowledging that the data science process itself doesn’t solely revolve around data scientists — you need individuals with different skill sets to support each step of the AI lifecycle. Thus, in order to go beyond the AI hype and successfully implement a comprehensive and sustainable AI strategy, you need to be able to dissect each part of the AI model lifecycle, translate it into concrete organizational resources and needs, and then map those needs to the different data profiles available. 

There is no universally perfect data talent, but there is talent out there that is perfect for each organization. Building successful AI projects require people with a range of skill sets, including — at a high level — at least a few of the following profiles:

chart about job roles

Do Find Balance

Achieving the proper balance between all the different data profiles is critical to an efficient data practice at the organization overall. Hiring too much of one profile and not enough of another can cause bottlenecks in processes and frustration all around. For example:

  • An organization that hires too many data scientists out of the gate but that doesn’t have enough data architects to build and maintain the database architecture in a way that allows to continuously deploy, improve and scale machine learning models in production may result in frustration on both sides, as neither of them sees the real-life business impact of their work.
  • Not having enough data leaders or managers can cause communication with the business and clear prioritization of projects to crumble, resulting in data scientists or analysts working on their own. This, in turn, could also mean missed opportunities for reuse across data projects.

Ultimately, finding a good mix of data professionals that is the right balance for the business is key to staff retention. A well-oiled machine means happier employees, with fewer people having to perform tasks outside of their skill or comfort zones.

Do Hire Diversely for Responsible AI

We worked on having a very diverse team with 20-25 data scientists, and after 25, the funny thing is that it worked on its own — we didn’t have to look for people actively, they came to our team. Dataiku helped as well because to get a new colleague up and running used to take 2-3 months, and now it’s 3-4 days. When we ask why data scientists came to Rabobank, we hear ‘We can start working on a project within the first month, and that’s cool!’

-Martin Leijen, Business Architect Data Wholesale and Rural at Rabobank

A key question with regards to the future of data teams, data roles, and even for AI as a whole is whether AI should be inclusive (that is, encompassing all different types of people across roles working together toward a common goal) or exclusive (siloed to specific and specialized teams to get the job done more precisely and efficiently).

While restricting data and AI operations to highly specialized, agile teams using complex technologies may result in a quicker delivery of a new product when the main goal is to disrupt the market through a technological advantage, we at Dataiku believe that any long-term AI strategy within an organization is bound to become more inclusive, as businesses opt-in for scalability, sustainability and the democratization of data processes.

Today, inclusive AI means several things:

  1. Collaboration on AI projects between different people with different profiles, strengths, and educational backgrounds; by nature, this also usually means different departments working together to achieve a common goal.
  2. But more broadly than simple collaboration for one particular project, it also translates into the wider infusion of AI processes throughout an organization — a complete transformation in the way of working.
  3. Lastly, but perhaps most importantly today, inclusive AI also has started to take on a slightly different meaning that is outside of the way businesses and companies work internally. It deals with issues like bias, responsibility, interpretability, and fairness. 

Ultimately, inclusive AI will allow for organizations to more easily adopt Everyday AI with the use of data distributed through all teams, lines of business, and profiles — technical or not — at the business. This is the key to unlocking scalable AI.

Responsibility is another key factor to consider when building a sustainable AI staffing strategy. Responsible AI continues to be a hot topic in the data science and machine learning space, and while it contains many dimensions (including sustainability and reliability of AI-augmented processes as well as governability), one of the most talked-about aspects is that of accountability. That is, ensuring that models are designed and behave in ways aligned with their purpose and that they don’t introduce any risk via unintended — or overlooked — biases. 

Don't Wing It

Before hiring, consider exactly what the needs of the business are and which types of data profiles would add the most value. Making this type of decision before creating a job posting will be beneficial in listing specific skills and honing interview questions. Some key questions to consider for understanding the organization’s immediate needs when it comes to staffing AI projects are:

  • What are the first projects the organization will tackle? (See our "Tackle the Right AI Projects for the Best ROI" flipbook for guidance on project selection and prioritization). 
  • What are the final, expected outputs of these projects?
    • Smaller-scale (e.g., dashboards or analytics for internal use, more geared toward a self-service analytics initiative)?
    • Operationalized models in production impacting a large part (or parts) of the business?
  • Is data for the projects readily available, or will part of the projects themselves be around finding and mining new data sources?

Ideally, for a complete mapping of staffing needs, data leadership would fill out a rubric with many of the same questions applied to each individual use case on the road map. The final product might look something like this (though columns should be customized for the industry or particular business):

staffing needs rubricDon't Look for Unicorns

In addition to attracting the right talent, it’s also important to break down the qualities the business needs for each data profile and interview specifically for those skills. Despite this knowledge, many companies still look to hire "data unicorns” — that is, supernatural all-in-one data wizards who possess the entire range of skills that the organization needs. 

Not only is this an expensive and unrealistic strategy (fancy data scientists with Ph.D.s who probably get closest to this description tend to be unavailable, as 80% of them are already taken by Google), but upon mapping out what the business needs and the skills required to fill those needs, it’s probably unnecessary as well. Now that you aren’t looking for unicorns, what should you look for in a data scientist candidate? 

Here is a checklist of skills for a top-notch data scientist; some might be applicable to other profiles, as well. It’s up to your business to list the qualities and qualifications required for each role:

  1. A Good Data Scientist Communicates Effectively to Business Users: The harsh reality is that statistics are complex. A data scientist has no hope of enlightening the average business user with an Excel file. To let the data tell a story, a data scientist needs to have a veritable Swiss army knife of presentation skills to convey their results persuasively, to anyone. This can range from the most mundane (Powerpoint presentation) to the most exotic (multimedia storytelling using interactive Javascript visualizations based on the latest D3 framework).
  2. A Good Data Scientist Knows Your Business: A data scientist needs to have an overall understanding of the key challenges in your industry, and consequently, your business. She must be familiar with the industry's financial ratios to rapidly assess whether there is a potential gain, its order of magnitude, and then find inspiration before taking her next breath. Another characteristic of a true data scientist is that they are fascinated by the subjects that will have the greatest impact, not the problems in themselves. A data scientist is not a scientist in the traditional sense; it’s not the quest for truth that drives her, but the process to uncover it.
  3. A Good Data Scientist Understands Statistical Phenomena: Data scientists must be able to correctly interpret statistics: is a result representative or not? This takes an understanding of statistics that allows the data scientist to assert, with authority, why 3% is statistically significant for certain cases but means nothing for others. This skill is key since the majority of stats we analyze contain statistical bias that needs correcting.
  4. A Good Data Scientist Makes Efficient Predictions: The data scientist must have a broad knowledge of algorithms to select the right one, and moreover, know which features to adjust to best feed the model. There is often a certain degree of creativity involved here; as a painter uses color to convey depth, a data scientist must know how to combine different data so they complement each other.
  5. A Good Data Scientist Provides Production-Ready Solutions: Today's data scientists need to provide services that can run daily, on live data. What's new here is that historically, back-office models built by BI or data mining teams were often re-written by technical teams for real-time production environments. Nowadays, a recommender system cannot withstand a rewrite before being put online.
  6. A Good Data Scientist Can Work on a Mass Scale: A data scientist must know how to handle multi-terabyte datasets to build a robust model that holds up in production. He must not be afraid of datasets with a 12-digit file size. In practice, this means that he needs to have a good idea of computation time, what can be done in memory, and what, on the other hand, requires Hadoop and MapReduce.

Looking Ahead

Staffing for data and AI initiatives is no easy task, but with a comprehensive approach to understanding the different data profiles and skillsets, mapping them to the organizational needs at every stage of the AI lifecycle, and through the right combination of targeted hiring and upskilling, companies can start building for a sustainable everyday AI. 

Data teams are complex, nuanced organizations with different kinds of people using different tools yet all working toward the same ultimate end goal. If the data team is not a well-oiled machine, the end goal (implementing a scalable and sustainable AI strategy) will suffer. The challenge of handling future growth must be balanced with the reality of hiring and upskilling team members with diverse profiles and skill sets that are appropriate for your business model.

Data science, machine learning, and AI platforms are a clear win for data teams and, when implemented the right way, can provide a solution to many of the data team challenges, as well as serve as a foundation for building an inclusive and sustainable AI and data democratization strategy. Creating a collaborative AI ecosystem for your organization with a platform such as Dataiku enables organizations to take full advantage of both newly sourced and upskilled talent. 

With the ease of use and accessibility designed into the core of its capabilities, Dataiku allows you to safely and effectively scale AI efforts through every dimension of your organization, saving time and resources in the pursuit of your business goals. When utilizing an interface such as Dataiku’s, users find that communications are streamlined and centralized with visualization tools, easily shareable models, and organized project dashboards. Your AI talent will be complemented and fully supported by the role that Dataiku plays and collaboration will become organic to your business processes.

You May Also Like

Taking the Wheel Back With Dataiku's Model Override Feature

Read More

I Have GCP, Why Do I Need Dataiku?

Read More

How to Build Tailored Enterprise Chatbots at Scale

Read More

Operationalizing Data Quality: The Key to Successful Modern Analytics

Read More