This is a repost of an article by Conor Jensen, Dataiku's VP of Data Science - Americas. The original article can be found on Enable Architect.
Data architecture is a pivotal element of Enterprise AI. According to Gartner, “Data architecture is returning with vengeance as recent cloud practices have begun to encounter the systems design, data management, and application portfolio issues reminiscent of the 1990s. ... Data will be even more distributed, increasing the demand for a central governing authority within the office of the chief data officer. However, the approach should allow for agility and adaptability in data management. Data architecture decisions will require a level of discipline for establishing viable options for implementation, deployment, and cost management.”*
But who are the ones behind the curtain actually making those processes a reality? Enter: data architects (and the various iterations of their role). In this blog post, we’ll break down how the functions of each architect differs and how their work impacts the business.
1. Data Architect
Traditional data architects are responsible for understanding the overall enterprise architecture and ensuring that it meets the requirements for data needs from across the business. They are typically involved in defining how data will be collected, stored, and consumed. Some of their additional tasks include:
- Adapting and refining data flow management (knowing where data is going to and coming from) and the data storage strategy
- Controlling access to data (and who should have access to what)
- Data modeling and data integration (collaborating with data scientists and members of the IT team when necessary and automating whenever possible)
- Maintaining solid knowhow of data governance and data warehousing
- Nimbly evolving the organization’s architecture as needs shift (and being the go-to expert when it comes to implementing any new architectures)
Hire a data architect if you’re looking to conceptualize and build a robust data infrastructure from the ground up that fits your unique business needs, develop and refine data policies and procedures, and make thoughtful recommendations and proposals for changes to the existing architecture. Notably, you’ll want them to have programming experience (Python, R, SQL), an understanding of various types of database systems and functions such as data warehousing and data lakes, and, of course, the ability to collaborate well across different teams.
2. Machine Learning Architect
As MLOps has moved beyond a buzzword in the data science community to a concept that is required for successful management of machine learning lifecycles at scale, simultaneously giving rise to a new, trendy architect: the machine learning architect. Most of the time, AI is a service that will be used by other applications. Machine learning architects typically handle not only tasks related to data ingestion and integration, but also how machine learning models will be served to applications and services across the entire information system in tandem — making them an important element of any sound MLOps practice. They are responsible for:
- Define the right strategy for exposing each AI model or service (as a batch service, a library, a REST service, a stream consumer, and so on)
- Ensuring a scalable and flexible environment for model pipelines
- Communicating with data and business teams to study new technologies that improve machine learning model performance in production
- Collaborating with groups across the enterprise (such as data scientists, data engineers, and DevOps to ensure optimal performance of productionized models and business users regarding the relevance of the models and the need to make new ones or evolve existing ones)
- Maintaining a holistic view of dependencies and resource allocation in order to keep a pulse and identify any bottlenecks for future improvements
Hire a machine learning architect if you’re looking to rapidly scale your MLOps strategy, navigate transformations and challenges associated with moving models to and monitoring model performance in production, and have a holistic vision of the machine learning model lifecycle. This person must have a deep understanding of all things machine learning (from design to monitoring) and data integration, as well as strong programming skills.
3. Enterprise Information Architect
According to Gartner, “success as an enterprise architect depends on establishing the foundation for a sound approach from the beginning. This requires a firm grasp on best practices, avoiding worst practices, and an understanding of how to deliver value and communicate vision, goals, and objectives.”** The role typically involves:
- Overseeing the enterprise architecture across various data types (structured, unstructured, etc.)
- Advocating for the creation of a corporate data policy or information management strategy, outlining the nuts and bolts of data ownership, data stewardship, audit requirements, and service-level agreements that must be adhered to
- Staying abreast of the latest trends in regulations and compliance, including but not limited to data privacy laws, guidelines for specific verticals, and so on
- Developing processes to ensure proper governance, security, and data quality
- Having a clear, documented plan to more effectively collaborate with business and IT counterparts
Hire an enterprise information architect if you’re looking to drive adoption and ensure adherence to established enterprise information architecture principles, guidelines, and standards and effectively develop and maintain data architecture blueprints (in a way that’s easily understandable by everyone involved).
4. IT/Cloud Architect
Although IT architects are not involved in the actual data science projects, their job is to make sure the systems that data teams are using are working and connected to all the data sources and analytics services the team needs for their work. Their primary functions include:
- Making sure the backend infrastructure is well-conceived, so data teams across the company can effectively leverage data
- Staying up to speed with the demand as more people begin using data, ensuring systems function as they are intended to and maintaining security
- Overseeing IT security and compliance to prevent loss of data, malware infections, legal and compliance issues from bogging down data projects
- Monitoring the availability of computer system resources, especially data storage (cloud storage) and computing power, without active management from users
- Monitor the major changes on cloud services that data science projects depend on
Hire an IT/cloud architect if you’re looking for someone less focused on pure data and machine learning (you already have a team of data scientists for that) and more focused on seamless service interoperability and cloud security (you have multiple cloud services to get data from and push data into, want to vet identity and authorization, etc.). If data democratization is skyrocketing within your organization and you are rapidly breaking down data silos, the IT/cloud architect will help teams work in their optimal configuration, have a birds-eye view into who is using data, and support elasticity and resource optimization.
Looking Ahead
When it comes down to it, the commonalities amongst these various types of architects are an ability to understand and communicate how data fuels the business forward, the knowhow to navigate disparate data sources (including their structure, contents, where they’re traveling to or from, and their significance), and in-depth knowledge about data tools and technologies.
While there is no industry standard “one-size-fits-all” definition for their unique stakeholders who really know how to tow the line between data intricacies and the business side of things, we hope that these snapshots help you effectively gauge what exactly your business (and specifically data architect team) is looking for.
*Gartner, Top 10 Trends in Data and Analytics 2020, May 2020
**Gartner, The Enterprise Architecture Leader’s First 100 Days, August 2020