This is the fourth and final article in Kaldea’s culture guide. It covers being the customer.
From Seoul to New Orleans
Did I go on a food trip? Yes, kind of but not really. Last month, I had the privilege to join the world’s analytics engineering community in New Orleans at Coalesce 2022 - The Analytics Engineering Conference in New Orleans from October 14 to October 21.
For those of you who are not yet familiar, Coalesce 2022 is an Analytics Engineering Conference hosted by dbt Labs.
Coalesce 2022 had three main parts.
- Coalesce New Orleans
- Coalesce Online + London, Sydney
- Coalesce Online
Coalesce New Orleans was the main event where you could attend all offline sessions and networking events over 5 days. The London and Sydney events were held offline for 2 days along side online sessions. I took the lunge to fly out to Coalesce New Orleans because I wanted to feel the actual vibe on site. Was it worth the trip? Yes, certainly. The networking alone was worth it but there was much more. Coalesce this year had over 100 sessions and workshops and more than 10,000 people participated online.
Session recaps from Coalesce 2022 New Orleans
Keynote: The end of the road for the modern data stack you know
On the second day of the event, 2022.10.18, Tristan Handy, Founder & CEO of dbt, shared his keynote. It began with a story of how the dbt community started and gained momentum - what started as a small meetup in New York now has 40,000 Slack members; and with people from 96 different countries are participating in Coalesce 2022.
What makes dbt so special? Tristan Handy answers with this slide.
In short, dbt has “resolved the governance issue with the data knowledge in an incredibly easy way.” That's how I felt it when I was personally using dbt. It wasn't just a transformation tool, but a service that made it very easy to mange knowledge of SQL for each unmanaged metric and what the Transformation Layer needed (e.g. test, development environment, version management). For example, even with the same metrics, the query used by each analyst and data engineer can often be different, and they are often unsure which one is the right one. dbt solves these problems easily.
The next topic was dbt's focus on Ecosystems from the beginning transitioning into his talk on the Modern Data Stack.
Showing the list of modern data stack's ecosystems (above), he explained the main topic of the Keynote - The End of the Road for The Modern Data Stack You Know.
Previously, Velocity and Governance had opposite tendencies. It used to be that you had to choose between Slow and Govern or Fast and un-Govern. Today, the role of the Modern Data Stack is to change the choices of one or the other to both and better, Fast, faster, and govern. As a data engineer, I couldn’t agree more that such an option is what I want and what I expect from the continued developments in the data industry. This is a dilemma most of us data folks were faced with the past few years, and in large today. Once you choose the Fast and un-Govern strategy, the organizational inertia built around it does not allow you to easily come back and pick up the governance piece, and vice versa, therefore I think why dbt took off.
Why is dbt special? Community!
If you ask why dbt is so special, I'd say it’s because of the community. In fact, alternative data services that solve both speed and governance already exist. However, few services have such a strong and large community as dbt. The size of the dbt community is huge. I was really surprised that its Slack Community has over 40,000 people. This is a lot more than Airflow (about 27,000 people as of 10.29.2022). At Coalesce New Orleans, I was surprised once again that there were so many engineers who were enthusiastic about dbt. How was it possible to create such a community? Tristan pointed out four key points on the dbt community’s success.
- dbt as a product: dbt filled in the missing piece of governance (documentation and testing) while not sacrificing speed. With it, dbt provided a scalable and sustainable system building choice for data engineers.
- Career expansion for data analysts: In a world where engineering background was key (e.g. Spark), dbt enabled analyst to utilize MPP data warehouse just with SQL for transformation, and highlighted analysts as the champion.
- Synergy from a larger ecosystem: Multiple, easily available integration with a ton of modern data stack services made dbt a no brainer. When you create a model in dbt it is transferable to most other modern data stacks.
- Open source and pricing: Garnering participation from the larger development community, and providing a forever free version and a priced version (quite reasonable), what is there not to love about?
Dbt now has enough influence over the data community that it can even mint new positions like the analytics engineer. Again, dbt community felt very special, many openly share information, publish content, and actively participateed in the conference. This felt a bit more than just technology.
I was privileged to make some new friends in New Orleans, and am still in touch over the community Slack channel.
The new keywords in the modern data stack
This is an overwhelming amount of choices involved with the Modern Data Stack, have a look at a16z’s, it’s even more complex.
I believe this is partially due to the fact that data gets set up in the later stages of the company and in different situations. Data focused teams are recruited after the organization grows to a certain size or maturity, which means companies are putting a bandaid in many situations until they cannot.
A few topics stood out to me at this conference.
Reverse ETL means that data from multiple sources ingested into the Datawarehouse is loaded back to multiple sources in reverse. In the B2B SaaS world this is a common request you would get from your revenue/marketing/sales ops teams, pump product data back to Salesforce, Gainsight, etc. In these cases it is extremely easy to fail on the governance end as it gets more complex very easily. But today, there are lots of good solutions addressing reverse ETL. The illustration above shows how simple reverse ETL becomes with a solution like Hightouch. I’ve also had a time where I had to understand each different API specifications on multiple tools just to transfer a small amount of data, wish I had such services back then. Below are the list of companies that stood out to me during the conference.
Data Quality beyond Data Catalog
It seems many teams are now commonly building and utilizing services or open source related to data catalog. However, when it comes to data quality, it seemed like the majority have chosen to build a pretty simple internal tool to do minimum checks. So a lot of new tools coming to this space was music to my ears as it would save me lots of time on providing quality data and reliability when working with my counter parts.
Something I wanted to see more: improved time to insight and analytics systems
I was super encouraged by the explosion of tools in some of my pain areas today such as Reverse ETL and data observability. However, on a company wide view we still struggle a lot with data on time for each ad-hoc request. Ther are a million reasons as to why this is the case but time to insight which is a key factor of the fly wheel in creating a data culture inside companies was not talked about.
Speaking to many tech leaders across the board, I felt a strong pull towards the internally build analytics systems at places like Uber, Airbnb, and LinkedIn. However, those systems are extremely difficult to replicate elsewhere because it requires high level of engineering investment to manage it.
That is what makes me excited about companies like Kaldea, making a really ambitious and bold approach to providing a unified analytics platform where all things from modeling, discovery, governance, analysis, and visualization are connected. Certainly, not an approach most startups can take as it has a steep development curve due to a large area of coverage, but I am hopeful about the impact unified analytics systems can bring to companies. Deleveraging the pressure onto data producing teams and helping companies serve data on a more timely fashion to any kind of ad-hoc requests where self-service dashboards aren’t the only answer (we know where that road ends). If you have not, check out what unified analytics platforms can do for you and your company!
Coalesce and the New Orleans Vibe!
Daily evening networkings were simply great. Really enjoyed my time with new friends from the conference. Could not skip the jazz bars!
The New Orleans Saints!
Over the weekend, I went on a swamp tour nearby and saw a small alligator in the wild. The weather was really nice and the nature beautiful.
Thank you for reading this post and I look forward to Coalesce 2023! If you want to visit the sessions online, here is the whole cake!
Once again, thank you for reading and hope to see you next year.