As a popular adage has it, data scientists spend 80% of their time preparing the data and only 20%...
With dbt’s recent announcement, Minerva 2.0, and Transform's launch, the metric store is a hot debate for companies that are looking to self-serve data and make it more accurate to its internal data consumers. The metric store is also relevant to data tool providers like Kaldea because it can change the way we provide catalog and discovery functions to our customers.
Metric Store, the ins and outs
What is a Metric Store?
A metric store is where you define and manage metrics for your business so that teams can easily pull and visualize the data they need with a high probability of consistency and accuracy. The metric store solves the confusion when a particular table explodes into multiple dimensions which happen when teams begin to add their own definitions and calculations of a single metric. To address this metric Tower of Babel, data scientists and analysts spend a significant amount of time defining the metric itself, increasing the time required during the discovery process. It also allows you to avoid creating continuously new and redundant tables. In short, the metric store provides a single source of truth for a metric store and facilitates a much more efficient and effective analysis.
Should you invest in a Metric Store?
The metric store is not a new concept. In fact, many companies have and are operating their own form of a metric store to a degree. Today, certainly there are more open source projects (e.g. Minerva) and tools (dbt, Transform, etc) that allow you to adopt a metric store. However, the real cost of investing in and operating a metric store, an intense up-front investment to define and support metrics and to create a new approval process, has not changed. While it is a clean approach to have a metric store for the most frequently used metrics, decision-makers should be careful not to commit to the metric store as a one fits all solution to producing insight between data, business, and product.
Investing in a metric store means the following.
- You are going to spend the time to pre-define many of your key metrics
- You are going to set a new data analysis and metric creation process that guarantees consistency and accuracy
- You will build a foundational table to support the metric store.
To support a metric store, you’re creating a team-wide collaboration process and protocol (🚨) that requires a review process for all changes made to the metric store. Taken from an operational view, ad-hoc analysis that requires a new metric can take much longer because now you have an approval process to run on the metric in use. So if you are working as a matrix team (product team + DS/DA), you now have an additional process to go through to get things done.
While you might gain speed with self-serve and by spending less time on discovery for your existing metrics, it may work against that very purpose if existing metrics do not glean the product and business insight you need for a specific moment and context. Investing heavily in metric definition, creating a foundational table, and crafting new policies; risks creating work that doesn’t get used after and creates an inflexible analysis process that is ossified around existing metrics.
This brings you to consider how much you should front-load your metrics.
If you decide to go ahead with a metric store, we suggest you balance out the process of converting to a metric store by setting up a small cohort of initial metrics and policies to put in place while leaving ample room for existing ad-hoc analysis to continue to access the best from the metric store but prevent operational overhead that can slow down ongoing ad-hoc analysis that is too early to be defined into a standardized metric.
Tools to set up your metric store
Let’s also take a look at a few existing ways to invest in your metric store: Minerva, Transform, and dbt.
Since Minerva’s v1.0, we have consistently heard from our customers around the world that they want Minerva. Now that it has been updated to v2.0, let’s have a high-level look.
- Time to visualization
- As long as your data scientists and analysts have an understanding of your table’s dimensions, Minerva lets you pull out related metrics into charts and graphs in a very short time.
- Minerva comes with a sensor system that allows you to guarantee near-real-time consistency of your data.
- Data freshness
- Minerva enables you to set a separate schedule with airflow.
- Heavy in-house lift
- While there is a source for you to execute on, you have to make the heavy lift to use Minerva. You will have to execute a lot of initial analysis in order to refactor your data warehouse so that you can create and use a foundation table.
- Policy requirement
- You will need to develop your own policy such that the new schemas you create do not break the rules that makes Minerva work.
- New interface adaptation
- This is a minor point, but you will have to adapt to Minerva’s yaml based interface.
Summary of Minerva
- If you have the data engineering muscle to flex and the data science operation set up, Minerva can be a powerful tool to support your organization. Just remember, it was Airbnb that created Minerva to serve its own purpose.
Transform is a centralized metrics store that empowers data analysts to deliver accurate, timely, confident, and fast insights. (launch blog)
- Transform allows you to create and load your metric store from your data warehouse.
- Transform has a very user-friendly interface that can be easily understood and adapted.
- Data freshness:
- Transform provides Airflow and Prefect integration.
- Heavy lifting:
- Transform requires a core engineering investment: to create a proper foundation table. Lessening the data engineering work is what we were looking for in using Transform but could not conclude that it does.
- Policy requirement:
- We had difficulty understanding what approval system you would need to operate Transform. A vague policy can create havoc in your well-intended effort. You will have to develop your own approval system or policy to avoid redundancy.
Summary of Transform
- UI/UX of data tools makes a big difference when it comes to utilization. If you have the resources to invest in refactoring your foundation table, approval process, and system; Transform can be a great approach to providing a metric store interface to your teams.
dbt provides a workflow that lets you create data pipelines with SQL. (more about DBT)
- Not so much a heavy lift:
- dbt lets you create a metric store without Data Engineers. Enough said.
- Approval system:
- dbt delegates the approval system to GitHub so that you do not have to create your own.
- Static view and dashboard:
- This is neither dbt’s strength nor what you use dbt for, so you might have to build your own interface.
- Data freshness:
- You will need to use a separate tool for scheduling.
- Policy requirement:
- While dbt delegates the approval system to GitHub, you will still have to build a policy to avoid redundancy.
- dbt is a great approach to saving Data Engineering resources and probably is the fastest way to build a metric store. However, it does not come with a full end-to-end package that can help with the easy-to-use UI/UX for your teams.
Investing in a metric store has promising outcomes: consistency and accuracy of your metrics, reduced redundancy, faster time to analysis and visualization based on pre-defined metrics, and lots more. However, it requires a lot of technical and resource investment, as well as careful policy and operational planning and investment.
We recommend that you take a gradual approach to set up your metric store and define rigorous approval and operational policies around key metrics. However, make sure that the approach is gradual. Creating a metric store shouldn’t prevent you from analyzing data that aren’t already defined as metrics, even if that means the continued redundancy and existing analysis process continues to run. It is not a 0 to 1 approach, but a gradual build of the store over an extended period of time.
The gradual approach: the data analysis workbench
The gradual approach is to create a workbench environment for your data teams to collaborate seamlessly without having to depend too much on internal policy or front-loaded metric definitions that may never really get used.
At Kaldea, we provide an analysis platform that data teams can use to produce insight through organic collaboration. Performing your day-to-day analysis on Kaldea will enable high-speed and efficient collaboration amongst your data teams, enabling you to continuously differentiate what needs to be in your metric store and what is not yet ready to be centralized.
Give us a shout at Kaldea.com!