What kind of data belongs in a “Minimum Viable Record”?
A Minimum Viable Record has the necessary fields and validation for GTM teams to engage. How do you ingest, clean, and validate these records? Read: "Everything Starts Out Looking Like a Toy" #158
Hi, I’m Greg 👋! I write weekly product essays, including system “handshakes”, the expectations for workflow, and the jobs to be done for data. What is Data Operations? is a post that grew into Data & Ops, a team to help you with product, data, and operations.
This week’s toy: styling data plots to look like pop culture icons such as Star Wars and Barbie. (Why hasn’t this happened in Google Docs and Office 365 yet? Perhaps it’s just licensing.) Edition 158 of this newsletter is here - it’s August 14, 2023.
If you have a comment or are interested in sponsoring, hit reply.
Data Operations (“Everything Starts Out Looking Like a Toy”) is a reader-supported publication. Please consider subscribing.
The Big Idea
A short long-form essay about data things
⚙️ What kind of data belongs in a “Minimum Viable Record”?
There’s a lot of pressure on sellers today to hit activity metrics and reach as many of their accounts as possible, especially when it comes to new business. These sellers rely on up-to-date information on accounts and people to know who and when to contact. The system giving them the signal? The CRM. The team that’s responsible for that data? The operations team.
The most important job for a Revenue Operations team to keep the selling process going is to deliver actionable data to the selling team. When a new lead enters the system or an account that hasn’t been approached before, you need to have solid information to follow your selling playbook. This of course is easier said than done when the base state of most sales data is … less than perfect.
Yet sellers need the record to be ready to sell. The goal of the ops team is to develop a data factory that takes information from leads, enriching and rearranging the records so they are ready for engagement. One way to think about when these records are ready is to establish a standard for completeness and engagement.
Let’s call that state “the minimum viable record”, borrowing from the minimum viable product concept to name the record quality sellers need to engage.
What is a Minimum Viable Record?
A “minimum viable record” is a record with the minimum necessary populated values and data cleanliness to use that record in the environment. If you’re talking about accounts, you might need a Company Name, website, and country; if you’re talking about people, you might need first name, last name, email, and title. Because different types of objects in your CRM are used differently, the definitions of that MVR are going to change from object to object.
What’s important to remember about the concept is that the records that meet this standard are ready to be used. As in the “Minimum Viable Product” idea, a record may not be “complete” in every field but has all of the fields completed and checked that are necessary and expected for that kind of object.
The existence of an MVR implies that some records don’t meet the standard. When records are not ready, they need to be fixed and reintroduced to the queue.
This sounds great, but how do you know when the record is ready to share with a seller? For this example, let’s take a look at an account record. Although we’re talking about a specific type of account, the concept holds for other kinds of records.
What do you need to know about a company to create an account record?
Company or account records depend upon a relatively small number of fields, so they are a good example of the Minimum Viable Record concept. If you know the name, website, and country for an account, you have enough information to enrich it and learn more. If this was a contact, you need a bit more information (first name, last name, email, title, and country.)
Your sales playbook might not need more information to start prospecting an account. But things would go a lot better if you had slightly more information, including technographic (which technologies they are using), firmographic (what segment does this company belong to), and other enrichment information. You might also want to know if this account is a duplicate, belongs to another seller, or whether it’s in the industry you intended.
Building a data factory that produces records
Here’s where the process comes in. A data factory processes the records to determine whether they are viable. For each type of object, there’s a flow that results in improved output or the record being placed into a queue where it can be improved.
Using the account record as a guide, here’s a blueprint for a process ending up in clean data that’s ready to use.
“The Goldilocks Record”
No record in your system is going to be perfect (sorry, Revops teams). That being said, aiming for the “just right” record will give us the best result. When we’re thinking about what the end state of that record, we want to consider:
Are all the fields that we expect to be populated filled with good information?
Do the values selected in picklists for this record match the definitions we expect for a picklist value?
Is the information in the record internally consistent?
Have we enabled enough tracking information and history to know when and why something changed?
The Goldilocks record will result from a great process – and also from the knowledge that most records in a CRM will drift over time – so building a continuous change process will win over expecting the information to be perfect.
Ingest: Bringing information into the system
The first step in the process of building a Minimum Viable Record is to bring information into the system.
Whenever you bring in new records, you’ll want to check:
Is this new or existing data? Typically uniqueness for records like an account record is determined by name, domain name, and country, but there are many different organizational definitions for accounts.
Does this record follow the expected format for fields and values? For each field, you want to know if there is a constrained set of values or an open field. Even for open fields, there may be an expected length. For true/false fields, you may need to store the concept of null or “not set” in addition to the expected true false fields, as many systems set a default value that’s hard to distinguish from null.
How does this record compare to a unique key? Along with question 1 above, you’ll need a matching algorithm to compare existing records to new records. One way to do this is to establish a canonical ID for all records. In systems like Salesforce, you’ll get this for free, but consider adding a global ID to help you track these records in other systems like a data warehouse.
Cleaning a record
When we talk about “cleaning” a record, we mean both the process of validating the existing data against the definition of each field and also the internal consistency of the data in that record.
For example, you might find that a company has a regional office in Bellingham. Using a USPS lookup, you find that both Bellingham, WA and Bellingham, MA are valid addresses. Which one is right? The data integrity for addresses is enforced by the zip+4 postal code, letting you know which address is right.
Most fields aren’t that easy to validate, so you need to build a rubric to let your team know how to break ties when there is conflicting information. For industry information, you might pick the Linkedin Industry API to standardize your industry list and also provide an external validation point.
When cleaning a record, you need to consider:
How to handle nulls - are you using a system that has true and false only for boolean fields, or is there a possibility that the field is not yet set
What happens when you get conflicting data? When the field is set with a value and you receive new information, will you replace it every time? Or do you need to confirm that the new information is valid and from a source that you expect to replace information? (Don’t forget to store a log for the last system that updated a value)
Fields that need to be validated against external lists or definitions - in the case of industry-standard values like the one listed above, do you restrict available values to the correct ones or fix the values later after ingesting?
Fields that need to be validated against each other - the most obvious example of this one is making sure that “a customer is a customer” but there are many more. When fields are updated independently to each other it’s possible that they will be set to values that don’t make sense.
Enriching a record with external (or internal) data
Once you have a clean record with the minimum fields established, you will want to enrich it with additional data based on the key fields involved. For account records, that’s often the domain name or website of the company, combined with the billing country or headquarters location.
Here are a few things to consider about enrichment to make your life easier:
What’s the unique key? What will you send to the enrichment provider to get back a record or a value?
What other systems are you using to enrich this record? For account records, you could use other providers you trust to combine the domain name, industry key, and other items like a SIC code to get firmographic information.
How do you fix erroneous data? One way to handle mistakes is to store the enrichment responses in a separate field or object - that way, if the information turns out to be bad, you don’t have to go to a log to find the original value. (Your mileage may vary - the system may not support this kind of history and you may need to handle this differently).
Enrichment also costs something - either processing time or money. You’ll want to record the last time a record was enriched to avoid re-processing it too often.
You’ve got a record, it’s been cleaned and enriched, and now you want to know whether it’s in good condition. The strategy of cross-checking the information will help you to know whether a record is ready to share with a seller or needs to be remediated.
Here are some cross-checks that you’ll want to build into reports that you run over the completed records:
Look for missing information - create a report that finds records with missing information - if the report isn’t empty, you’ve got reports to fix
Find records that haven’t been enriched on the expected schedule - a report that shows records with old enrichment dates will give you a pick list of records to re-enrich
Group records by the unique key and include groups with more than one record - this is a handy way to find duplicate records, especially if the duplication extends across more than one field
Building yourself a dashboard of reports that should have no records will help you to know on a daily basis whether you have records to fix.
Fixing the inevitable bad record
If you’ve got a #data-ops channel in your Slack, you’ll probably hear about a problem that needs to be fixed several times a week. It could be an expected problem – perhaps you almost always miss titles from a personal email because they need to be researched on Linkedin – or an unexpected one, such as needing to fix the date of an MQL or an SQL to align with actual behavior. In any or all of these cases, you need to know which record is affected, what to do to fix it, and how to know when it’s fixed.
For each record that you want to fix, you need to know:
Where did the issue occur? When you look at the history of that record, is there a way to know where the bad data was entered?
Is the current record fixable? While you are fixing it, have a field that you can flip to keep records out of the current workflow. This is going to vary by type of record, as you’re not going to remove records that salespeople are working on. But you are going to keep records that aren’t ready out of the way until they are ready.
What happens if a record you are fixing becomes a hand-raiser? Include a way to push records back into the system if they ask for a demo, request pricing, or similar. The record still needs to be fixed, but you’ve got a live lead.
The goal of fixing records? When they return back to the view of the average user, they need to be fixed or annotated with a reason letting you know why they weren’t fixed. In your schema, having a field that tracks the most recent enrichment date and system is almost always helpful.
Why do we bother to do this?
People notice bad data. They don’t notice good data until it becomes a problem.
How do we get better at doing it?
Create a standard
Measure adherence to the standard
Validate by showing 2nd order effects of having good data
Normalize asking your team to find errors and the process of fixing them.
What’s the takeaway? You should expect to have data that needs to be fixed in your environment. Building a basic standard for the expected quality of data will help you to streamline the process to ingest, clean, validate, and deliver accurate and reliable data to your teams.
Links for Reading and Sharing
These are links that caught my 👀
1/ Pixar tech is making other movies better - If you’ve wondered how the digital team at Pixar makes textures work, they are now powering the technology that is making many other movies. The Pixar view of the world (maybe this is the metaverse we’re going to get) is influencing almost every other company that makes content. Now Apple’s joining the party to define a standard for 3d interaction.
2/ How to get to engaged users - The product team at June writes about the problem of “low floor, high ceiling” - or the desire to make it easy for new users to start with a product while still maintaining “expert-level” features for power users. This is a tricky thing. Make it too easy, and experts will think it’s a toy. Make it too hard, and entry-level users won’t try it.
3/ Making Google Sheets behave like SQL - It would be great if Google would support Joins in the QUERY function. Until they do, here’s a clever technique to accomplish the same thing when you want to append data to an existing set of data using a common key.
What to do next
Hit reply if you’ve got links to share, data stories, or want to say hello.
The next big thing always starts out being dismissed as a “toy.” - Chris Dixon