Making Sense of Startup Ecosystem Data

Use Cases & Projects Lynn Heidmann

When it comes to data on startups, Startup Genome is the gold standard — their yearly report on the startup ecosystem is well-respected (not to mention well-cited). But how are they able to find and make sense of data in order to produce it?

man working on his laptop at a large wooden table

munish malhotraWe spoke to Munish Malhotra, Director of Analytics and Data Science at Startup Genome, to learn more and here’s what we found out about their challenges (hint: they might sound familiar…).

startup-genome

IN-DEPTH: HOW STARTUP GENOME DOES IT

They Have to Fill in the Blanks

The nature of the business is that structured, readily available datasets don’t necessarily exist (in fact, this is the case at least 20 percent of the time for Startup Genome). And when they do find data, it’s often incomplete. So just like other enterprises making their way in the age of AI, data quality is an issue (our survey of more than 50 CDOs showed that it is one of the top data issues worldwide).

open empty notebookIf data doesn't exist, the Startup Genome team starts from a blank slate

That means they have to put data through a set of business rules in order to fill out the missing information. For example, the first step might be to manually hunt for any missing data and the second might be to create a standard estimation of the missing data.

They Must Always Consider the Context

When doing data analysis, the team at Startup Genome has to minimize bias and be able to consider the context of their data in order to truly draw meaning from it. For example, if there is data on how many engineers are graduating in a region, they need to be able to determine how it is relevant and whether there is there a correlation between that data and startups.

Maybe in some cities, there are lots of graduates, but not a lot of startups. However, in context, that doesn’t necessarily mean there isn’t a correlation. This could be because those cities don’t provide the resources and support for recent graduates to work in startups. 

You May Also Like

Taming LLM Outputs: Your Guide to Structured Text Generation

Read More

No-Code ML and GenAI With Dataiku and Fabric

Read More

The Objects of an LLM Mesh for Building LLM-Powered Applications

Read More

Data Lineage: The Key to Impact and Root Cause Analysis

Read More