Going from 0-1 in Data Operations

What are the components needed to do Data Operations? Explore the building blocks in "Everything Starts Out Looking Like a Toy" #166

Oct 09, 2023

Hi, I’m Greg 👋! I write weekly product essays, including system “handshakes”, the expectations for workflow, and the jobs to be done for data.

This week’s toy: an engineer who builds a string art machine. I can’t think of any reason why this should exist or why the tech would help someone, and also marvel at the ingenuity and creativity of this approach. Yay humans! Edition 166 of this newsletter is here - it’s October 9, 2023.

If you have a comment or are interested in sponsoring, hit reply.

The Big Idea

A short long-form essay about data things

⚙️ Going from 0-1 in Data Operations

Imagine you are starting a new venture and need to describe all the data tasks that need to happen to get you from “nothing” to “something” in Data Operations.

These are the basic building blocks for understanding the work we typically do in a Data Ops team and are a good reminder for organizing the ongoing work and functioning of data in an early-stage company.

Let’s start by stating that Data Operations is not rocket science. It is a structured way of working with data to meet the everyday needs of the business and provide a framework for asking and answering data questions.

Here’s a list of the systems you’ll want to build or identify to go from zero to one in Data Operations.

Eventing and Hooks and Workflow, oh my!

Some of the most important data you want to know about signals a change that needs attention. For example, when a customer signs up for a new account, there are numerous systems that need to be updated, starting by looking at the customer status. To do this, you need a system that sends information to a specific back-end URL using an API call. By providing a specific hook for that event, you can trigger other systems in near real-time.

Think of eventing as the part of the system that lets other software know when “something important” takes place. It requires a listener that is ready to receive information, a payload of expected information, and a series of steps in a workflow that get executed when the payload is received. Whether you are running this on a schedule or just in time, a tool like Pipedream helps you respond creatively.

A place to store that information

Changing this customer data (or inserting a record when they are brand new) implies that you have a place to store information separate from your operational database for your application. Whether you are on Team Database, Team Data Lake, or Team Data Warehouse, you need to store transactional data, rolled-up data, and transformed data to share with other applications in your system or visualize in a reporting layer.

Snowflake is a great option for this and by no means the only one. You might pick it over BigQuery or Postgres because it scales nicely and combines the concepts of databases and a warehouse. (If you have a lot of data – meaning billions and trillions of rows – you probably want to spend a bit more time on your infrastructure, but this is intended for the “get started” crowd).

Transforming Data into Models

Operationally speaking, we often talk about “models” to describe the information in the system. A model is the shape we expect data to take for a particular record in a table, including the fields to bring together. We use single or multiple queries to produce or assemble the fields for the model using systems like dbt or another data pipeline tool.

Whether you use dbt or another solution, the goal is to take the raw material (transactional data, attributes in tables, time-series data) and assemble it into a model that standardizes the representation of information about that thing.

An account model might tell you basic information like the name of a company and its canonical ID value. It might also show you the number of logins in the last 48 hours or the status of that company so that you can make business decisions on that information without having to run multiple other queries.

Sourcing and Sending Information

What about the raw material that we need to populate our data warehouse? It’s going to come from sources – the ETL (extract, transform, and load) process starts with copying data from business systems like Salesforce, Zendesk, and other line-of-business systems.

You’ll also want to send important events and transformed data to some of these same systems, for example when you have workflow in your marketing automation or CRM tools that depend upon changes in operational data.

When customers upgrade their service, they may move into a different marketing or sales segment, so your customer data platform or your CRM needs to receive this broadcast. We commonly call this feature “Reverse ETL” because it takes data from the warehouse and sends it to the systems that need to know that information.

Keep in mind that the reverse ETL process also serves as an eventing loop, sending messages to collaboration systems like Slack or email or also kicking off the workflow glue we described earlier.

Asking and answering business questions

Now that you have a modeled set of data in your database and know that it’s getting updated on a schedule and at important events, it’s time to visualize that data to enable other teams in your business.

Start by making a list of key business metrics - these could be:

a customer count
the number of daily sales-qualified leads
this month’s sales numbers

If you’re not sure where to start, here are some examples.

The goal here is to build dashboards in a tool like Sigma to provide daily value, be updated on a schedule, and highlight significant events like a customer addition or a customer churn. If you’re tracking when leads fail to become qualified, then you can analyze those cohorts and find out why.

How do all of these pieces work together?

Before a Data Operations system is in place, you will definitely find some of the data that you need in some of the systems. Other systems will also be immediately stale and make it hard for you to enable team members outside of the team members in their own operational systems.

After a Data Operations strategy is in place, imagine this scenario:

A person signs up for a demo and a Slack message is sent to a team for processing
If no action has been taken by the SLA (service level agreement) time, the message to the channel is updated
When that person eventually decides to purchase the software, there is already a calculation to compare them to other purchasers (how long did it take for them to buy)
The dashboard of key metrics is updated instantly (or for other items, on a regular schedule)
When the customer updates their phone number or other information, it’s sent to the operational systems to be updated, according to the information hierarchy you describe
It’s now possible to use automated segments or messaging to provide a more personalized experience to the customer based on what happened

The beauty of this process is that every operational system now has the potential to get updates on what’s happening to the customer. And that’s the big picture: engaging with customers works much better when there is an updated customer record showing what’s going on. Data Operations helps make that happen.

What’s the takeaway? Building a Data Operations practice involves tools to move information from operational sources through a data warehouse and out to destinations, but the real benefit of this work is to broadcast what’s going on with the customer. By focusing on the customer, we make it easier for teams to respond accurately, effectively, and quickly. And for the business, we’re enabling the ability to pose and answer important questions using data.

Links for Reading and Sharing

These are links that caught my 👀

1/ Make your type better - You can make a few small changes to render type more effectively. The good news? You don’t have to be a UI designer to notice them and take advantage of these tips when making presentations or building slides.

2/ U2 launches a new arena - If you haven’t heard about The Sphere - a 20k seat arena that has internal and external LED screens bigger than anywhere in the world – check out what U2 is doing in their concert to kick off the opening.

3/ PirateML - NomnomL is a method for writing diagrams in plain text and rendering them automatically. It’s easy to create relative to other solutions like Mermaid Chart or d2 and great for process rendering.

What to do next

Hit reply if you’ve got links to share, data stories, or want to say hello.

Want to book a discovery call to talk about how we can work together?

The next big thing always starts out being dismissed as a “toy.” - Chris Dixon