Imagine waking up one morning with the dream of creating a big data startup. It's easy!
To do big data, you generally just need:
- An idea for a process that could be automated
- Some "Data Scientists" to come up with an algorithm
- Data, volumes of it
You already have the basic idea. As for the "Data Scientists," you can usually find them among your co-founders (otherwise, in the excellent schools at Berkeley or NYU).
But what about finding the data? There are several competing strategies to consider when creating your startup...
Strategy 1: The grasshopper - Trusting open data
The grasshopper is optimistic, and tells himself that the data must be available somewhere, and probably freely accessible. He trusts open data, current or future, in order to provide his service.
This Open Data strategy can yield profitable results in the financial or transport markets; for example, startups can use merchandise transport information and cross reference it with information on cargo and market prices, to provide highly relevant information to industry professionals.
The main drawback of the Open Data approach is the limited scope of open data. Indeed, for both ethical and economic reasons (which come together for once) open data is lacking when you are looking to learn specific things about a person, a product or an address... Anyway, the most useful things are private (fortunately) and paid (unfortunately).
Strategy 2: The spider - Building your network
The spider is meticulous. No data? Never mind, let's go look for it says the spider. The spider creates a network of points where he can go to capture the data, sometimes starting with the smallest, and then he gradually looks for the bigger ones. The spider will manufacture all the access points, all the connectors, allowing each player to provide him with its data and use his service.
Most online marketers take this approach: this means having your "tracker" (component for capturing traffic from a third party site) all over the web, so as to have the most data and the largest network possible. Now that we are turning to mobile applications, we no longer talk of "trackers" but of SDK (Software Developpement Kit).
To talk more in terms of the physical world, Lokad has built a network extending to supermarket cash registers, enabling simple connectivity for the vast majority of specialized cash register and inventory logistics management programs; this enabled it to develop an effective solution for the supermarket inventory solution.
The spider's approach makes it possible to build startups that can move very quickly to scale, if successful.
However, they are sensitive to two risk factor:
- The fragility of the web (a big player can decide to "change the game" and disrupt everything)
- The lack of a fat enough insect (big, interesting clients remain out of network range)
Strategy 3: The Fox - Hunting the "big group"
The fox seeks out "Big Data" where it is: in large businesses where "Big Data" is well fed!
The fox moves ahead in several steps:
- First he suggests a possible solution to a problem (e.g., reducing your fraud, improving your ad buy costs, increasing the performance of your email marketing programs, optimizing the cost of raw material purchases, etc., etc.)
- Then he will collect the customer's data, and try to solve the problem.
- Lastly, he will use the knowledge obtained from this first customer to simply solve the problems of other customers.
The fox has a difficult life, because in order for his first approach to succeed, he must make believe he can solve a problem that he's never solved before! To do this, he must stir the desires of the powerful (charming the big bosses of the group), flaunt his power (talk about algorithms, professors, doctors), show himself off in all his finery (plan for refreshing graphs and design).
So the "Big Data" fox has a thankless role, since he often has to convince before having had the chance to prove himself on real data!
When the fox succeeds, he is certain he can solve a real problem, with an immediate value for a potential purchasing group.
However, he has a very pernicious role: the product, the service, the model he has constructed painfully for his first big client has often been intended to be sold to a smaller customer, a competitor of the former; and the fox teaches the big customer to better serve the smaller one.
This means the big group playing the game can sometimes indirectly help its competition to obtain services at a better price, and play the fall guy.
Strategy 4: The tool maker
Tool makers have a very social big data approach: since everyone wants to do Big Data, let's sell tools to do Big Data!
The market for Big Data tools is a very competitive market - it ranges from very small players (all young startups, open source projects) - to very large players - Google, Microsoft, Oracle, etc. but generates a great deal of investment (several billion invested in 2014 in the manufacture of Big Data tools)
The main issue in the market for "Big Data" tools is the same as with pickaxes during the Gold Rush: what sustainability is there? Will the need for tools will be strong and lasting, once the rush is over?
We believe that this will be the case with Dataiku, precisely because, in terms of data, we are still in the Stone Age. The pickaxe hasn't been invented yet! The intelligent, responsible contextual computer systems of tomorrow will require tools that are a bit more manageable and efficient for their construction.
Let's make pickaxes. And maybe someday we'll make jackhammers.
About Florian Douetteau Florian is Dataiku's CEO and one of our four co-founders. Dataiku develops the software Data Science Studio (DSS).