Data Mesh Architecture in Cloud-Based Data Warehouses


Data Mesh Architecture in Cloud-Based Data Warehouses

Data is the new black gold in business. In this post, we explore how shifts in technology, organization processes, and people are critical to achieving the vision for a data-driven company that deploys data mesh architecture in cloud-based warehouses like Snowflake and Azure Synapse.

The true value of data comes from the insights gained from data that is often siloed and spans across structured, semi-structured, and unstructured storage formats in terabytes and petabytes. Data mining helps companies to gather reliable information, make informed decisions, improve churn rate and increase revenue.

Every company could benefit from a data-first strategy, but without effective data architecture in place, companies fail to achieve data-first status.

For example, a company’s Sales & Marketing team needs data to optimize cross-sell and up-sell channels, while its product teams want cross-domain data exchange for analytics purposes. The entire organization wishes there was a better way to source and manage the data for its needs like real-time streaming and near-real-time analytics. To address the data needs of the various teams, the company needs a paradigm shift to fast adoption of Data Mesh Architecture, which should be scalable & elastic.

Data Mesh architecture is a shift both in technology as well as in organization, processes, and people.

Before we dive into Data Mesh Architecture, let’s understand its 4 core principles:

  1. Domain-oriented decentralized data ownership and architecture
  2. Data as a product
  3. Self-serve data infrastructure as a platform
  4. Federated computational governance

Big data is about Volume, Velocity, Variety & Veracity. The first principle of Data mesh is founded on decentralization and distribution of responsibility to the SME\Domain Experts who own the big data framework.  

This diagram articulates the 4 core principles of Data Mesh and the distribution of responsibility at a high level.

Azure: Each team is responsible for its own domain, and data is decentralized and shared with other domains for data exchange and data as a product.
Snowflake: Each team is responsible for its own domain, and data is decentralized and shared with other domains for data exchange and data as a product.

Each Domain data is decentralized in its own data warehouse cloud. This model applies to all data warehouse clouds, such as Snowflake, Azure Synapse, and AWS Redshift.  

A cloud data warehouse is built on top of a multi-cloud infrastructure like AWS, Azure, and Google Cloud Platform (GCP), which allows compute and storage to scale independently. These data warehouse products are fully managed and provide a single platform for data warehousing, data lakes, data science team and to provide data sharing for external consumers.

As shown below, data storage is backed by cloud storage from AWS S3, Azure Blob, and Google, which makes Snowflake highly scalable and reliable. Snowflake is unique in its architecture and data sharing capabilities. Like Synapse, Snowflake is elastic and can scale up or down as the need arises.

From legacy monolithic data architecture to more scalable & elastic data modeling, organizations can connect decentralized enriched and curated data to make an informed decision across departments. With Data Mesh implementation on Snowflake, Azure Synapse, AWS Redshift, etc., organizations can strike the right balance between allowing domain owners to easily define and apply their own fine-grained policies and having centrally managed governance processes.

Additional resources:


How to Develop a Data Retention Policy


How to Develop a Data Retention Policy

by Steven Fiore

We help organizations implement a unified data governance solution that helps them manage and govern their on-premises, multi-cloud, and SaaS data. The data governance solution will always include a data retention policy.

When planning a data retention policy, you must be relentless in asking the right questions that will guide your team toward actionable and measurable results. By approaching data retention policies as part of the unified data governance effort, you can easily create a holistic, up-to-date approach to data retention and disposal. 

Ideally any group that creates, uses, or disposes of data in any way will be involved in data planning. Field workers collecting data, back-office workers processing it, IT staff responsible for transmitting and destroying it, Legal, HR, Public Relations, Security (cyber and physical) and anyone in between that has a stake in the data should be involved in planning data retention and disposal.

The first step is to understand what data you have today. Thanks to decades of organizational silos, many organizations don’t understand all the data they have amassed. Conducting a data inventory or unified data discovery is a critical first step.  

Next, you need to understand the requirements of the applicable regulation or regulations in your industry and geographical region so that your data planning and retention policy addresses compliance requirements. No matter your organization’s values, compliance is required and needs to be understood.

Then, businesses should identify where data retention may be costing the business or introducing risk. Understanding the risk and inefficiencies in current data processes may help identify what should be retained and for how long, and how to dispose of the data when the retention expires.

If the goal is to increase revenue or contribute to social goals, then you must understand which data affords that possibility, and how much data you need to make the analysis worthwhile. Machine Learning requires massive amounts of data over extended periods of time to increase the accuracy of the learning, so if machine learning and artificial intelligence outcomes are key to your revenue opportunity, you will require more data than you would need to use traditional Business Intelligence for dashboards and decision making.

data retention policy

What types of data should be included in the data retention policy?

The types of data included in the data retention policy will depend on the goals of the business. Businesses need to be thoughtful about what data they don’t need to include in their policies. Retaining and managing unneeded data costs organizations time and money – so identifying the data that can be disposed of is important and too often overlooked.

Businesses should consider which innovation technologies are included in their digital roadmap. If machine learning, artificial intelligence, robotic process automation, and/or intelligent process automation are in your technology roadmap, you will want a strategy for data retention and disposal that will feed the learning models when you are ready to build them.  Machine learning could influence data retention policies, Internet of Things can impact what data is included since it tends to create enormous amounts of data. Robotic or Intelligent Process Automation is another example where understanding which data is most essential to highly repeatable processes could dictate what data is held and for how long.

One final note is considering non-traditional data sources and if they should be included. Do voice mails or meeting recordings need to be included? What about pictures that may be stored along with documents? Security camera footage? IoT or server logs? Metadata? Audit trails? The list goes on, and the earlier these types of data are considered, the easier they will be to manage.

Avoid these pitfalls

The paradox is that the two biggest mistakes organizations make when building a data retention policy are either not taking enough time to plan or taking too much time to plan. Spending too much time planning can lead to analysis paralysis letting a data catastrophe occur before a solution can be implemented. One way to mitigate this risk is to take an iterative approach so you can learn from small issues before they become big ones.

A typical misstep by organizations when building a data retention policy is that they don’t understand their objectives from the onset. Organizations need to start by clearly stating the goals of their data policy, and then build a policy that supports those goals. We talked about the link between company goals and data policies here.

One other major pitfall organizations fall into when building a data retention policy is that they don’t understand their data, where it lives, and how its interrelated. Keeping data unnecessarily is as bad as disposing of data you need – and in highly silo-ed organizations, data interdependencies might not surface until needed data is suddenly missing or data that should have been disposed of surfaces in a legal discovery. This is partially mitigated by bringing the right people to the planning process so that you can understand the full picture of data implications in your organization.

In closing

The future of enterprise effectiveness is driven by advanced data analytics and insights. Businesses of all sizes are including data strategies in their digital transformation roadmap, which must include data governance, data management, business planning and analysis, and intelligent forecasting. Understand your business goals and values, and then build the data retention policies that are right for you.

We are here to help.

Additional Resources:

Using Data to Improve Patient Outcomes

Using Data to Improve Patient Outcomes

Can predictive analytics in healthcare change patient outcomes?

It’s no secret that technology is making its mark in the healthcare industry. From surgery rooms to at-home care, technology is being applied in ways that only push healthcare forward. Within the past year, companies such as Google and Microsoft have begun stepping into the healthcare field. And it doesn’t stop there, hospitals such as Johns Hopkins have also joined the movement. But why now?

At Valence we’ve seen first-hand what technology can bring to the table for a patient’s care. Whether it’s pain management through Virtual Reality, training for medical professionals, or quicker EMR workflows, technology has solved many pain points for the healthcare industry and there are no signs of slowing down. Over the years, healthcare has shifted to a more predictive approach. With this perspective doctors can focus on preventive measures with a goal of fewer hospital trips and better long-term care for the patient. This new approach has only been made possible by the large amount of data available at our fingertips and the birth of predictive analytics.

Let’s talk about predictive analytics in healthcare.

Predictive analytics in healthcare uses data to help predicate outcomes. Whether it’s for healthcare or environmental purposes there is one common goal: to prevent negative outcomes. This approach is extremely powerful, but there is an existing technology that can take it further, Artificial Intelligence. By merging the two we can truly harness the power of data to improve people’s health.

Today, artificial intelligence is being used to help doctors diagnose patients. Drawing from a patient’s family history or medical images, AI can be applied in different scenarios. For example, an artificial intelligence diagnostic device is helping doctor’s diagnosis patients with a specific eye disease. Just by uploading a high-resolution picture, this device can take the image and interpret results on its own. While artificial intelligence can assist with individual patients, the biggest advantage is its ability to operate with machine learning in which it can analyze a large amount of data, learn, and adapt. It can take data from thousands of patients, analyze their medical history, and make predictions on a much larger scale.

The integration of artificial intelligence and predictive analytics is transforming patient care on a small and large scale. It’s making value-based care attainable while keeping the patient at the heart of it all. At Valence we understand the technology of Machine Learning and the potential it will bring to your organization. Whether you are involved in healthcare, retail, manufacturing, or more, Artificial Intelligence can be applied to many industries. The time for artificial intelligence is now, so what will you do with it? Contact us, and we’ll start you off with a demo to show how remarkable this technology can be!