Unifying the data stack
"Everything Starts Out Looking Like a Toy" (No.32)
This week’s toy: a method of writing candy hearts for Valentine’s Day that look decidedly unfamiliar, thanks to machine learning.
An interesting thing to note here is that using GPT-3 for templated writing is usually quite effective. Perhaps we’ll see a YCombinator-funded AI-powered greeting card company soon.
Edition No. 32 of this newsletter is here - it’s February 6th, 2021.
The Big Idea
There’s a big project that we need to solve to make information workers more productive, happier, and effective. It’s the data stack they use to get their work done. To most people in information work today, it looks like Excel or Google Sheets. But most of the work in those individual spreadsheets is fragmented, isolated, and lost.
Unifying the Data Stack
A friend shared this article on the modern data stack this week by Tristan Handy of Fishtown Analytics, the makers of DBT.
What do you need to know about the modern data stack?
If you think about the kinds of things you need to do with data, they fall into a few big buckets:
Ingesting that data. You bring data in from a source to a place like a data warehouse or a spreadsheet where you can make changes to it. This might happen on a regular schedule or it might be a 1-time import depending upon what you’re doing.
Storing your information in a place where many people in your organization can interact with it safely. You’d often see this called “data warehousing” or a “business intelligence database” where analysis can be done in a structured way, often combining information from several sources into a fuller view of the information.
Enriching the information. What you get at the beginning of a data process rarely looks like what you have by the end of that process. It can be challenging to build that transformation process in a structured way, so having essentially a continuous delivery tool for data is an important piece of your data stack.
Visualizing the result is both an exploratory process and something that results in interactive presentations like dashboards. Typically there is a higher level tool like Tableau or Looker that’s used for this purpose.
Understanding insights from that data, including when things have gone wrong. This is a new area of work around data discoverability and observation, helping you to see how things are changing in your data stack.
What needs to change about this stack?
Simply put, the data stack is too hard for normal people to understand. Most of us default to using tools like Excel or Google Sheets for a few simple reasons:
It’s difficult to know what data is available, how it is defined and how often it changes. When you control your own data set, you know exactly what you’re looking at and when it last changed. When you’re using data from other people, you need metadata on what it is, when it changed, and what it’s telling you.
Transforming data into views requires understanding of the underlying schema, and an ability to query effectively. Depending upon the dimensionality of information and how it is stored, you might not be able to just count the number of rows but to transform the information before it becomes usable.
Each organization has its own set of visualizations and outputs and requires a slightly different template. Finding the commonality between all of the reports that use pipeline information in a sales org might require you to analyze multiple reports rather than changing a centralized definition in a data catalog.
A simpler data stack
What would an improved data stack look like?
Its goals might include:
improving discovery of data - help me understand what data is available for the problems I want to solve;
auto mapping of schema - show me the data structures that I’m seeing, and use the language of the business metric I’m exploring in addition to the database structure of the underlying information
templates that can be used by multiple organizations that hide unusable data visualizations - use an existing UX paradigm like Google Sheets to show me recommended information in context
live reports that update themselves when the underlying schema changes - bind the reports to underlying information so that when they are corrected they get updated
What’s the takeaway? A data stack should be rethought in terms of the people who actually use it - both the visualizers of data and the business users who need to make decisions based on that data. Bringing the structure, data definitions, and information to the place where people are already doing their work will help organizations become more data-centric while allowing more people to participate.
A Thread from This Week
Twitter is an amazing source of long-form writing, and it’s easy to miss the threads people are talking about.
This week’s thread:
A note: if you’re interested in this problem, please let me know what sort of threads you’d like to see.
Links for Reading and Sharing
These are links that caught my eye.
1/ Charge🔋- solid state batteries are coming, and along with them the promise for cheaper and more abundant mobile power sources. The biggest impact here is going to be in future battery packs for automobiles, e-bikes, and smaller micromobility devices. A change in battery chemistry also means different basic sources for those items, perhaps tipping the Rare Earth Metals supply problem to more favorable for manufacturers outside of China.
2/ Start with the customer in mind - Ben McCormack’s Journey Mapping article details an excellent heuristic for decoding a customer problem into a root cause analysis. By walking backwards in time to imagine how the experience could have been better for the customer, Ben builds a 5 “whys” exercise to make your product development process better.
3/ What’s next in tech - The Great Unbundling - Ben Evans’ take on how tech will be evolving in the years ahead - is a presentation worth your time to digest. Where is eCommerce (or just commerce) going, and how will the growing emphasis on logistics-as-a-service change the services you and I use today?
On the Reading/Watching List
Zach Tratar’s article “Disinformation: The Game of Belief is Now a War” is a long read of an article that is important to understand the world that we live in now where almost any fact can be challenged in a post-truth manner. Tratar explains the ways in which our own beliefs can be and are used against us to build an information landscape for the use of special interests on all sides. This is an important piece to add to your brain so that you build and strengthen your critical thinking.
The 2014 cartoon miniseries (10 episodes, 12 minutes each) Over the Garden Wall is an unexpected joy. The adventure saga of two brothers trying to find their way has beautiful animation, endearing characters, famous voices, and at least one song that will not stay out of your head after you hear it.
What to do next
I’d love your 100% anonymous feedback - how was this newsletter?
Click the button below that matches how you feel …
Thank you. If you found this useful, consider sharing with a friend.
The next big thing always starts out being dismissed as a “toy.” - Chris Dixon