Making Sense of Startup Ecosystem Data

When it comes to data on startups, Startup Genome is the gold standard — their yearly report on the startup ecosystem is well-respected (not to mention well-cited). But how are they able to find and make sense of data in order to produce it?

man working on his laptop at a large wooden table

We spoke to Munish Malhotra, Director of Analytics and Data Science at Startup Genome, to learn more and here’s what we found out about their challenges (hint: they might sound familiar…).

startup-genome

They Have to Fill in the Blanks

The nature of the business is that structured, readily available datasets don’t necessarily exist (in fact, this is the case at least 20 percent of the time for Startup Genome). And when they do find data, it’s often incomplete. So just like other enterprises making their way in the age of AI, data quality is an issue (our survey of more than 50 CDOs showed that it is one of the top data issues worldwide).

open empty notebook If data doesn't exist, the Startup Genome team starts from a blank slate

That means they have to put data through a set of business rules in order to fill out the missing information. For example, the first step might be to manually hunt for any missing data and the second might be to create a standard estimation of the missing data.

They Must Always Consider the Context

When doing data analysis, the team at Startup Genome has to minimize bias and be able to consider the context of their data in order to truly draw meaning from it. For example, if there is data on how many engineers are graduating in a region, they need to be able to determine how it is relevant and whether there is there a correlation between that data and startups.

Maybe in some cities, there are lots of graduates, but not a lot of startups. However, in context, that doesn’t necessarily mean there isn’t a correlation. This could be because those cities don’t provide the resources and support for recent graduates to work in startups.

Making Sense of Startup Ecosystem Data

They Have to Fill in the Blanks

They Must Always Consider the Context

You May Also Like

Everything to Know: AI Agents for Supplier Risk Assessment

Building AI Agents for Life Sciences: From Silos to Synthesis

Scaling GenAI in Financial Services With Dataiku and NVIDIA

How Databricks & Dataiku Embed Governance Into AI Workflows

Making Sense of Startup Ecosystem Data

They Have to Fill in the Blanks

They Must Always Consider the Context

Discover Dataiku Success Storiesd

Subscribe to the Dataiku Blog

You May Also Like

Everything to Know: AI Agents for Supplier Risk Assessment

Building AI Agents for Life Sciences: From Silos to Synthesis

Scaling GenAI in Financial Services With Dataiku and NVIDIA

How Databricks & Dataiku Embed Governance Into AI Workflows