With data and analytics teams growing larger and working on more collaborative projects, security and governance are becoming bigger issues.
Who is allowed to use what data? And what are they allowed to do with it? Big data brings Hadoop clusters, which use Kerberos and ACLs and LDAPs, all to manage rules and access rights. But what are those exactly, and how do they work?
For those of us that work in analytics and data, this means that we're more likely to find ourselves in conversations with IT professionals where the vocabulary quickly gets quite technical. For that reason, we've created a glossary with some key terms that you can refer to the next time you find yourself discussing Hadoop security.
What is a yarn, and what is a metastore? What is the difference between Sentry and Ranger?(Hint: not much.) What does Apache Knox do? What do we mean by impersonation? And what do Hive, Pig, Impala, Tez, Spark, and MapReduce all have in common?
In the coming weeks, we'll follow up with some thoughts about best practices in Hadoop security and governance. For now, study up so you can thrive in your next discussion on Hadoop security.