Almost everyone thinks they’re acting rationally. No matter how illogical (or uneven unhinged) an action may appear to outsiders, there’s almost always an internal logic that is at least understandable to the person making that decision, whether it’s an individual or an organization.
And it’s especially apparent in organizations. How many times has a company you liked or respected at one time made a blunder so mystifying that even you, as a fan, have no idea what could possibly have caused the chain of events that led to it? Yet if you were to ask the decision-makers, the reasoning is so clear they’re baffled as to why everyone is not in total lockstep with them.
There are any number of reasons why something that’s apparent to an outsider might be opaque to an insider, and I won’t even try to go over all of them. Instead, I want to focus on a specific categorical error: the misuse of data to drive decisions and outcomes.
A lot of companies say they are data-driven. Who wouldn’t want to be? The implication is that the careful, judicious analysis of data will yield only perfectly logical outcomes as to a company’s next steps or long-term plan. And it’s true that the use of data to inform your judgment can lead to better outcomes. But it can also lead to bad outcomes, for any number of reasons that we’ll discuss below.
But first, definitions.
Data: Individual, separate facts. These tend to be qualitative – if quantitative, they tend to be reduced to qualitative data for analysis.
Story: Connective framework for linking and explaining data.
Narrative: A well-reasoned story that tries to account for as much of the data and context as possible. It is entirely possible (and, in most cases, probable) that multiple narratives can be drawn from the same set of data. Narratives should have a minimum of assumptions, and all assumptions and caveats should be explicitly stated.
Fairytale: A story that is unsupported by the data, connecting data that does not relate to one another or using false data.
I have worked in a number of different industries, all of which pull different kinds of data and analytics to inform different aspects of their business. I cannot thing of a single one that avoided writing fairytales, though some were better systemically than others. What I’m going to do in this blog is go over a number of the different pitfalls you can fall into when writing stories that lead you astray from narrative to fairytale, and how you can overcome them.
I’ll try to use at least one real-world example for each so you can hopefully see how these same types of errors might crop up in your own owrk.
Why fairytales get written
1. Inventing or inferring explanations for specific data
I used to work in daily newspapers back when that was still thought to be a viable enterprise on the internet. The No. 1 problem (as I’m sure you’ve seen looking at any news site) is the chasing of a trend. A story would come across our analytics dashboard that appeared to be “doing numbers,” so immediately the original writer (and, often, a cabal of editors) would convene to try to figure out why that particular story had gone viral.
Oftentimes the real reason was something as ultimately uncontrollable as “we happened to get in the Google News carousel for that story” or “we got linked from Reddit” – phenomena that were not under our control. But because our mandate was to get big numbers, we would try to tease out the smallest things. More stories on the same topic, maybe ape the style (single-sentence paragraphs), try to time stories to go out at the same time every day …
It’s very similar to a cargo cult – remote villages who received supply drops during WWII came to believe that such goods were from a cargo “god,” and by following the teachings of a cargo “leader” (which typically involved re-enacting the steps that led up to the first drops, or mimicking European styles and activities) the cargo would return in abundance. When, in reality, the actions of the native peoples had little to no effect on whether more cargo would come.
This commonly happens when you’re asked to explain the reason for a trend or an outcome, a “why” about user behavior. It is nearly impossible to know why a user does something absent them explicitly telling you either through asynchronous feedback or user interviews. Everything else is conjecture.
But we’re often called upon (as noted above) to make decisions based on these unknowable reasons. What to do?
The correct way to handle these types of questions is:
-
Be clear that an explanation is a guess.
-
Treat that guess as a hypothesis.
-
Test that hypothesis.
-
Allow for the possibility that it’s wrong or that there is no right answer, period.
2. Load-bearing single data point
I see this all the time in engineering, especially around productivity metrics. There is an eternal debate as to whether you can accurately measure the productivity of a development team; my response to this is, “kinda.” You can measure any number of metrics that you want in order, but those metrics only measure what they measure. Most development teams use story points in order to gauge roughly how long a given chunk of development will take. Companies like to measure expected vs. actual story points, and then make actions based on those numbers.
Except that the spectrum of actions one can take based on those numbers is unknowably vast, and those numbers in and of themselves don’t mean anything. I worked on a development team where the CTO was reporting velocity up the chain to his superiors as a measure of customer value that was being provided. That CTO also refused to give story point assignments to bug tickets, since that wasn’t “delivering customer value.” I don’t know what definition of customer value you use in your personal life, but to me “having software that works properly” is delivering value.
But because bugs weren’t pointed, they were given lower priority (because we had to meet our velocity numbers). This increased focus on velocity numbers meant that tickets were getting pushed through to production without having gone through thorough testing, because the important thing was to deliver “customer value.” This, as you can imagine, led to more bug tickets that weren’t prioritized, rinse and repeat, until the CTO was let go and the whole initiative was dramatically restructured because our customers, shockingly enough, didn’t feel they were getting enough value in a broken product.
I want to introduce you to two of my favorite “laws” that I use frequently. The first, from psychology, is called Campbell’s Law, after the man who coined it, Donald Campbell. It states:
- The more emphasis you put on a data point for decision-making, the more likely it will wind up being gamed.
We saw this happen in a number of different ways. When story points got so important, suddenly story point estimates started going way up. Though we had a definition of done that including things like code review and QA testing, those things weren’t tracked or considered analytically, so they were de-emphasized when it was perceived that including them would hurt the number. Originally, the velocity stood for “number of story points in stories that were fully coded, tested and QA’ed.” By the end, they stood for “the maximum number of points we could reasonably assign to the stories that we rushed through at the end of the week to make velocity go up.”
The logical conclusion of Campbell’s Law is Goodhart’s Law, named after economist Charles Goodhart:
- When a measure becomes a target, it ceases to be a good measure.
Now, I am not saying you should ignore SPACE or DORA metrics. They can provide some insight into how your development / devops team is functioning. But you should use any of them, collectively or individually, as targets that you need or should meet. They are quantitative data that should be used in conjunction with other, qualitative, data garnered from talking and listening to your team. If someone’s velocity is down over a number of weeks, don’t go to them demanding it come up. Instead, talk to them and find out what’s going on. Have they noticed? Are they doing something differently?
My personal story point numbers tend to be all over the place, because some weeks my IC time is spent powering through my own stories, but then for months at a time I will devote the majority of my time to unblocking others or serving as the coordinator / point person for my team so they can spend their time head-down in the code. If you measured me solely by story points, I would undoubtedly be lacking. But the story points don’t capture all the value I bring to a team.
3. Using data because it’s available
This is probably the number one problem I see in corporate environments. We want to know the answer to x question, we have y data, so we’re going to use y to answer x even if the two are only tangentially (or, sometimes not even that closely) related.
I co-managed the web presence for a large research institution’s college of medicine. On the education side, our number one goal was to increase the quality and number of qualified applicants for our various programs. Except, on the web, it’s kind of hard to draw a direct line between “quality of website” and “quality of applicants.” Sure, if we got lucky someone would actually go through our website to the student application form, and we could see that in the analytics. But much like any major life decision, people made the decision to apply or not after weeks or months of deliberation, visiting the site sporadically. This, in addition to any number of other factors in their life that might affect their choice.
But you have to have KPIs, else how would you know that your workers aren’t slacking? So the powers that be decided the most salient data point was “number of visitors from the surrounding geographic area,” as measured by the geographic identification in Google Analytics (back when GA was at least pretending to provide useful data).
Now, some useful demographic information for you, the listener, to know is that in the year that mandate started being enforced, 53% of the incoming MD class was in-state. So, at best, our primary metric affected very slightly over half of our applicants to our flagship program. That’s to say nothing of the fact that people looking on the website might also just be members of the general public (since the college of medicine was colocated with a major hospital). It’s also not even true, if we were somehow able to discern who of the visitors were high-value applicants, that the website had anything to do with them applying or not to the program! That’s just not something you accurately track through analytics.
This is not an uncommon phenomenon. Because they had a given set of quantitative data to work with, that was the data they used to answer all the questions that were vital to the business.
I get it! It’s hard to say “no” or “you can’t” or “that’s impossible” to your boss when you’re asked to give information or justification. But that is the answer sometimes. The way to get around it is to 1) identify the data you’d actually need to answer the question, and 2) devise a method for capturing that data.
I also want to point out that it is vital to collect data with intent. Not intent as in “bias your data to the outcome you want,” but in the sense that you need to know what questions you’re going to ask of the data in order to be assured you’re collecting the right data. Going back after the fact to interrogate the data with different answers veers dangerously close to p-hacking, where you keep twisting and filtering data until you get some answer to some question, even if it’s not even close to the question you started with.
4. Discounting other possible explanations
I once sat in on a meeting where they were trying to impart to us the importance of caution. They told us about the story of Icarus; in Ancient Greece, the great inventor Daedalus was imprisoned in the Labyrinth he had built for the minotaur. Desperate to escape, he fashioned a set of wings from candle wax and feathers for him and his son, Icarus. Before leaving, he warned Icarus not to fly too close to the sea (for fear the spray would weigh down the wings and cause them to crash) nor too close to the sun, for the heat would melt the wax and cause them to crash. The pair successfully escaped the Labyrinth and the island, but Icarus, caught up in the exhilaration of flight, soared ever higher … until his wings melted and he came crashing down to the sea and drowned.
We were asked to reflect on the moral of the story. “The importance of swimming lessons!” I cracked, “Or, more generally, the importance of always having a backup plan.” Because, of course, Daedalus was worried that his son would fly too high or too low; rather than prepare for that possibility by teaching him how to swim (or fashioning a boat), Daedaelus did the bare minimum and caught the consequences.
Both my explanation and the traditional, “don’t fly too close to the sun” are valid takeaways; this is what I mean when I say that multiple valid narratives can arise from the same set of facts. Were we presenting a report to Daedalus, Inc., on the viability of his new AirWings, I would argue the most useful thing to do would be to present both. Both provide plausible outcomes and actionable information that can be taken away to inform the next stages of the product.
On a more realistic note, I was once asked to do an after-action analysis of a network incursion. In my analysis, I pointed out which IP ranges were generally agreed to be from the same South American country (where there was no legitimate business activity for the targeted company); those access logs seemed to match up with suspicious activity in Florida as well as another South Asian country.
I did not tie those things together. I did not state that they were definitively working together, or even knew of one another. I laid out possibilities including a coordinated attack by the Florida and South American entities (based on timestamps and accounts used); I also posited it was possible the attack originated in South Asia and they passed the compromised credentials to their counterparts (or even sold them to another group) in South America/Florida. It’s also possible that they were all independent actors either getting lucky or acting on the same tip.
The important thing was to not assume facts I did not (and could not) know, and make it very clear when I was extrapolating or assuming facts I did not have. One crucial difference between fairytale and narrative is the acknowledgment of doubt. Do not assert things you cannot know, and point out any caveats or assumptions you made in the formulation of your story. This will not only protect your reputation should any of those facts be wrong, but it makes it easier for others to both conceive of other, additional narratives you might not have, and leaves room / signposts as to what data might be collected in order to verify underlying assumptions.
Summary
It can be easy to get sucked into writing a fairytale when you started out writing a narrative. Data can be hard, deadlines can be short and pressure can be immense. Do you what you can to make sure you’re collecting good data with intent, asking and answering questions that are actually relevant to that data, and not discounting other explanations just because you finished yours. Through the application of proper data analysis, we can get better at providing good products to our customers and treating employees with respect and compassion while still maintaining productivity. It just requires diligence and a willingness to explore beyond superficial numbers to ensure the data you’re analyzing is accurately reflecting reality.
The best thing about this talk is it travels really well: People in Australia were just as annoyed by their companies' decision-making as those in the US.
I have previously mentioned that I love NextDNS, but they do not make certain fundamental things, like managing your block and allow lists, very easy. Quite often I'll hit a URL that's blocked that I'd like to see - rather than use their app, I have to load their (completely desktop-oriented) website, navigate to the right tab, then add the URL in.
That's annoying.
Without writing an iOS app all of its own (which sounds like a lot of work), I wanted an easy to way to push URLs to the block or deny list. So I wrote an iOS Shortcut that works with a PHP script to send the appropriate messages.
You can find the shortcut here.
The PHP script can found at this Gist. You'll need to set the token, API key (API key can be found at the bottom of your NextDNS profile) and profile ID variables in the script. The token is what you'll use to secure your requests from your phone to the server.
I thought about offering a generic PHP server that required you to set everything in Shortcuts, but that's inherently insecure for everyone using it, so I decided against it. I think it would be possible to do this all in Shortcuts, but Shortcuts drives me nuts for anything remotely complex, and this does what I need it to.
It really is annoying how hard it is to manage a basic function. NextDNS also doesn't seem to have set their CORS headers properly for OPTIONS requests, which are required for browser-based interactions because of how they dictated the API token has to be sent.
OpenAI announced Sora, a new model for text-to-video, and it's ... fine? I guess? I mean, I know why they announced it - it's legitimately really cool you can type something in and a video vaguely approximating your description in really high resolution shows up.
I just don't think it's really all that useful in real-world contexts.
Don't get me wrong, I appreciate their candor in the "whoopsies" segments, but even in the show-off pieces some of the video is weird to just downright bad.
Hands are hard! I get it! But there's also quite literally a "bag lady" (a woman who appears to be carrying at least two gigantic purses), and (especially when the camera moves) the main character floats along the ground without actually walking pretty often.
Are these nitpicky things people aren't going to notice on first glance? Maybe. But remember the outrage around Ugly Sonic? People notice small (or large) discrepancies in their popular entertainment, and the brand suffers for it. To say nothing of advertisers! Imagine trying to market your brand-new (well, "new" in car definitions) car without an accurate model of said car in the ad. Or maybe you really want to buy the latest Danover.
It seems like all the current AI output has a limit of "close-ish" for things, from self-driving to video to photos to even text generation. It all requires human editing, often significant for any work of reasonable size, to pull it out of the uncanny valley.
"But look how far they've gotten in such little time!" they cry. "Just wait!"
But nobody's managed to push past that last 10% in any domain. It always requires a human touch to get it "right."
Like the fake Land Rover commercial is interesting, except imagine the difficulty of getting it to match your new product (look and name) exactly. You're almost going to have to CGI it in after, at least parts, at which point you've lost much of the benefit.
Unfortunately, "close enough" is good enough for a lot of people who are lazy, cheap or don't care about quality. The software example I'd give is there probably aren't a lot of companies who'd be willing to pay for software consultant services who are just going to use AI instead, but plenty of those people who message you on LinkedIn willing to pay you $200 for a Facebook clone absolutely are going to pay Copilot $20 a month instead.
And yes, there will be those people (especially levels removed from the actual work) who will think they can replace their employees with chatbots, and it might even work for a little bit. But poorly designed systems always have failure points, and once you hit it you're going to wind up having to scrap the whole thing. A building with a bad foundation can't be fixed through patching.
I have a feeling it's the same in other industries. I do think workers will feel the hit, especially on lower-budget products already or where people see an opportunity to cut corners. I also think our standards as a society will be relaxed a little bit in a lot of areas, simply because the mean will regress
But in good news, I think this'll shake out in a few years where people realize AI isn't replacing everything any more than Web3 did, but AI will have more utility as a tool in the toolkit of professionals. It's just gonna take a bit to get there.
The funny thing is a lot of the uncanny stuff makes it look like the model was trained on CGI videos, which might be a corollary to the prophesied problem of AI training on AI outputs. The dalmatian looks and moves CGI af, and the train looks like a bad photoshop insert where they had a video of a train on flat ground and matted over the background with a picture.