Enterprises have a wealth of data at their fingertips and are eager to accelerate data innovation. But data governance can frequently stand in the way, posing issues that are both profound and vexing.
Dataiku CEO Florian Douetteau exchanged questions with Immuta CEO Matthew Carroll about the relationship between data governance and machine learning and their impact on the future of business intelligence.
Florian Douetteau (FD): Hi Matt. Implementing the right security patterns on big data platforms is a concern for many companies. Some companies feel that there's a trade-off and an optimal balance to find between security, performance, and complexity. What's your take on that?
Matt Carroll (MC): There’s a joke in the cybersecurity world that the most secure computer is one that doesn’t turn on. So the idea that there’s this trade-off is something we spend a lot of time thinking about. But the problem with this line of thinking is that in big data environments where the stakes are so high, and where our customers literally can’t afford to have mistakes, it really can’t be an either/or scenario. We need to have both full utility and full security. More specifically, we like to think of good governance and good security as being one and the same—making sure only the right people can access the right data for the right reasons, for example, no matter how complex the environment. That’s the problem we built Immuta to solve. The trade-off between utility and compliance all too frequently slows things down to the point of diminishing returns on data science programs. By baking compliance into an access platform, we can eliminate that friction for large organizations.
"We need to have both full utility and full security." -Matt Carroll
FD: Data Leak is a new world that is bound to become mainstream in the public media. In the last few years, we heard lots of spectacular leak stories, but the leak of individual data because of badly designed predictive algorithms concerns lots of security researchers. Do you think this risk is serious, and what can companies do against that?
MC: Yes, it’s very serious. There’s some good research we’re following in the academic community about algorithmic liability—some showing that you can actually take an ML model and understand the data it’s been trained on. Our focus here has been on building customer-friendly anonymization techniques into our product to guard against this type of danger. So we’re focused on differential privacy, for example, which provides mathematical guarantees that no single point of data determines the outcome of any given model. That means that no single point of data, like a social security number or any unique ID, could ever be discovered, and it also means that the models are actually more accurate as well. And we’re really excited about this feature, which will be available in our product this summer, because it’s one of the many cases where good governance directly translates to good data science.
FD: You are the epitome of someone who can succeed in the public sector and then become a star in the private sector. What do you think both sectors can learn from each other in the tech space?
MC: That’s a loaded question! There are many different types of public sector work, but in the area I spent time in, the work was both high-stakes and we were dealing with some very sensitive data. So we couldn’t afford any mistakes on the technical side, and we couldn’t afford failure on the execution side. And to be honest, I haven’t had to do much translation—I still operate that way, as does everyone at Immuta. When you care about what you do, and you understand why it’s important, it doesn’t matter if you’re working in the intelligence community, for a startup, a Fortune 500 or an NGO.
FD: I heard recently someone from your team mentioning that software is pushing law in new areas. One aspect that still intrigues me is the legal framework for data and predictive models. Almost all data manipulated today has indirect ownership or rights from many providers (very often those being individuals). It's bound to create some conflicts in the future. Do companies have the means today to really track and understand what their data, and their model, are coming from?
MC: Most companies don’t, unfortunately. But that’s a direct product of the complex data landscape you’re talking about, where the “three v’s”—volume, variety, and velocity—of data keep expanding. So companies frequently don’t know all they need to know about the data they’re sitting on top of. They just know there’s value within the data. So new laws, like the EU’s GDPR, which has fines of up to 4 percent of global revenue, are helping to make enterprises more aware of new types of liability and helping to make this a priority for them. Similar laws are coming into place in China, and elsewhere. Our legal folks are tracking these developments closely, and making sure our product is aligned with this evolving legal landscape. The one big point I’d make though is that good data governance makes data science teams more agile, and actually improves their work. So you don’t only need to care about all these new (and sometimes scary) data laws to care about good governance.
FD: Immuta received a lot of traction as being the rock-solid way to build a secure data governance platform. When is the right time for a company to realize that they need you?
MC: Yesterday! More seriously, though, a lot of companies don’t realize how much friction they’re operating under, and how much ROI could be gained by embedding good governance and access policies into the way data science teams access data. So I’d say that any large enterprise that’s serious about data science should take a good, hard look at how their data scientists access their data, and how their governance policies get in the way. Nine times out of ten, removing that friction through our platform is a need that companies have long had, and that quickly translates into value.
Matthew Carroll serves as President and CEO of Immuta, the unified data platform for the world’s most secure organizations. Carroll has spent the past decade analyzing some of the world's most complex datasets while supporting multiple US Intelligence Community customers. He also served as an Officer in the US Army and served overseas in Iraq and Afghanistan, and graduated from Brandeis University with a Bachelor of Science in Chemistry, Biochemistry and Biology. Connect with him on LinkedIn.
Florian Douetteau is the Chief Executive Officer of Dataiku. Florian started his career at Exalead, an innovative search engine technology company. There, he led an R&D team until the company was bought by Dassault Systemes in 2010. Florian was then CTO at IsCool, a European leader in social gaming, where he managed game analytics and one of the biggest European cloud setups. Florian also served as Lead Data Scientist in various companies such as Criteo, the European Advertising leader.