Almost everyone thinks they’re acting rationally. No matter how illogical (or uneven unhinged) an action may appear to outsiders, there’s almost always an internal logic that is at least understandable to the person making that decision, whether it’s an individual or an organization.
And it’s especially apparent in organizations. How many times has a company you liked or respected at one time made a blunder so mystifying that even you, as a fan, have no idea what could possibly have caused the chain of events that led to it? Yet if you were to ask the decision-makers, the reasoning is so clear they’re baffled as to why everyone is not in total lockstep with them.
There are any number of reasons why something that’s apparent to an outsider might be opaque to an insider, and I won’t even try to go over all of them. Instead, I want to focus on a specific categorical error: the misuse of data to drive decisions and outcomes.
A lot of companies say they are data-driven. Who wouldn’t want to be? The implication is that the careful, judicious analysis of data will yield only perfectly logical outcomes as to a company’s next steps or long-term plan. And it’s true that the use of data to inform your judgment can lead to better outcomes. But it can also lead to bad outcomes, for any number of reasons that we’ll discuss below.
But first, definitions.
Data: Individual, separate facts. These tend to be qualitative – if quantitative, they tend to be reduced to qualitative data for analysis.
Story: Connective framework for linking and explaining data.
Narrative: A well-reasoned story that tries to account for as much of the data and context as possible. It is entirely possible (and, in most cases, probable) that multiple narratives can be drawn from the same set of data. Narratives should have a minimum of assumptions, and all assumptions and caveats should be explicitly stated.
Fairytale: A story that is unsupported by the data, connecting data that does not relate to one another or using false data.
I have worked in a number of different industries, all of which pull different kinds of data and analytics to inform different aspects of their business. I cannot thing of a single one that avoided writing fairytales, though some were better systemically than others. What I’m going to do in this blog is go over a number of the different pitfalls you can fall into when writing stories that lead you astray from narrative to fairytale, and how you can overcome them.
I’ll try to use at least one real-world example for each so you can hopefully see how these same types of errors might crop up in your own owrk.
Why fairytales get written
1. Inventing or inferring explanations for specific data
I used to work in daily newspapers back when that was still thought to be a viable enterprise on the internet. The No. 1 problem (as I’m sure you’ve seen looking at any news site) is the chasing of a trend. A story would come across our analytics dashboard that appeared to be “doing numbers,” so immediately the original writer (and, often, a cabal of editors) would convene to try to figure out why that particular story had gone viral.
Oftentimes the real reason was something as ultimately uncontrollable as “we happened to get in the Google News carousel for that story” or “we got linked from Reddit” – phenomena that were not under our control. But because our mandate was to get big numbers, we would try to tease out the smallest things. More stories on the same topic, maybe ape the style (single-sentence paragraphs), try to time stories to go out at the same time every day …
It’s very similar to a cargo cult – remote villages who received supply drops during WWII came to believe that such goods were from a cargo “god,” and by following the teachings of a cargo “leader” (which typically involved re-enacting the steps that led up to the first drops, or mimicking European styles and activities) the cargo would return in abundance. When, in reality, the actions of the native peoples had little to no effect on whether more cargo would come.
This commonly happens when you’re asked to explain the reason for a trend or an outcome, a “why” about user behavior. It is nearly impossible to know why a user does something absent them explicitly telling you either through asynchronous feedback or user interviews. Everything else is conjecture.
But we’re often called upon (as noted above) to make decisions based on these unknowable reasons. What to do?
The correct way to handle these types of questions is:
-
Be clear that an explanation is a guess.
-
Treat that guess as a hypothesis.
-
Test that hypothesis.
-
Allow for the possibility that it’s wrong or that there is no right answer, period.
2. Load-bearing single data point
I see this all the time in engineering, especially around productivity metrics. There is an eternal debate as to whether you can accurately measure the productivity of a development team; my response to this is, “kinda.” You can measure any number of metrics that you want in order, but those metrics only measure what they measure. Most development teams use story points in order to gauge roughly how long a given chunk of development will take. Companies like to measure expected vs. actual story points, and then make actions based on those numbers.
Except that the spectrum of actions one can take based on those numbers is unknowably vast, and those numbers in and of themselves don’t mean anything. I worked on a development team where the CTO was reporting velocity up the chain to his superiors as a measure of customer value that was being provided. That CTO also refused to give story point assignments to bug tickets, since that wasn’t “delivering customer value.” I don’t know what definition of customer value you use in your personal life, but to me “having software that works properly” is delivering value.
But because bugs weren’t pointed, they were given lower priority (because we had to meet our velocity numbers). This increased focus on velocity numbers meant that tickets were getting pushed through to production without having gone through thorough testing, because the important thing was to deliver “customer value.” This, as you can imagine, led to more bug tickets that weren’t prioritized, rinse and repeat, until the CTO was let go and the whole initiative was dramatically restructured because our customers, shockingly enough, didn’t feel they were getting enough value in a broken product.
I want to introduce you to two of my favorite “laws” that I use frequently. The first, from psychology, is called Campbell’s Law, after the man who coined it, Donald Campbell. It states:
- The more emphasis you put on a data point for decision-making, the more likely it will wind up being gamed.
We saw this happen in a number of different ways. When story points got so important, suddenly story point estimates started going way up. Though we had a definition of done that including things like code review and QA testing, those things weren’t tracked or considered analytically, so they were de-emphasized when it was perceived that including them would hurt the number. Originally, the velocity stood for “number of story points in stories that were fully coded, tested and QA’ed.” By the end, they stood for “the maximum number of points we could reasonably assign to the stories that we rushed through at the end of the week to make velocity go up.”
The logical conclusion of Campbell’s Law is Goodhart’s Law, named after economist Charles Goodhart:
- When a measure becomes a target, it ceases to be a good measure.
Now, I am not saying you should ignore SPACE or DORA metrics. They can provide some insight into how your development / devops team is functioning. But you should use any of them, collectively or individually, as targets that you need or should meet. They are quantitative data that should be used in conjunction with other, qualitative, data garnered from talking and listening to your team. If someone’s velocity is down over a number of weeks, don’t go to them demanding it come up. Instead, talk to them and find out what’s going on. Have they noticed? Are they doing something differently?
My personal story point numbers tend to be all over the place, because some weeks my IC time is spent powering through my own stories, but then for months at a time I will devote the majority of my time to unblocking others or serving as the coordinator / point person for my team so they can spend their time head-down in the code. If you measured me solely by story points, I would undoubtedly be lacking. But the story points don’t capture all the value I bring to a team.
3. Using data because it’s available
This is probably the number one problem I see in corporate environments. We want to know the answer to x question, we have y data, so we’re going to use y to answer x even if the two are only tangentially (or, sometimes not even that closely) related.
I co-managed the web presence for a large research institution’s college of medicine. On the education side, our number one goal was to increase the quality and number of qualified applicants for our various programs. Except, on the web, it’s kind of hard to draw a direct line between “quality of website” and “quality of applicants.” Sure, if we got lucky someone would actually go through our website to the student application form, and we could see that in the analytics. But much like any major life decision, people made the decision to apply or not after weeks or months of deliberation, visiting the site sporadically. This, in addition to any number of other factors in their life that might affect their choice.
But you have to have KPIs, else how would you know that your workers aren’t slacking? So the powers that be decided the most salient data point was “number of visitors from the surrounding geographic area,” as measured by the geographic identification in Google Analytics (back when GA was at least pretending to provide useful data).
Now, some useful demographic information for you, the listener, to know is that in the year that mandate started being enforced, 53% of the incoming MD class was in-state. So, at best, our primary metric affected very slightly over half of our applicants to our flagship program. That’s to say nothing of the fact that people looking on the website might also just be members of the general public (since the college of medicine was colocated with a major hospital). It’s also not even true, if we were somehow able to discern who of the visitors were high-value applicants, that the website had anything to do with them applying or not to the program! That’s just not something you accurately track through analytics.
This is not an uncommon phenomenon. Because they had a given set of quantitative data to work with, that was the data they used to answer all the questions that were vital to the business.
I get it! It’s hard to say “no” or “you can’t” or “that’s impossible” to your boss when you’re asked to give information or justification. But that is the answer sometimes. The way to get around it is to 1) identify the data you’d actually need to answer the question, and 2) devise a method for capturing that data.
I also want to point out that it is vital to collect data with intent. Not intent as in “bias your data to the outcome you want,” but in the sense that you need to know what questions you’re going to ask of the data in order to be assured you’re collecting the right data. Going back after the fact to interrogate the data with different answers veers dangerously close to p-hacking, where you keep twisting and filtering data until you get some answer to some question, even if it’s not even close to the question you started with.
4. Discounting other possible explanations
I once sat in on a meeting where they were trying to impart to us the importance of caution. They told us about the story of Icarus; in Ancient Greece, the great inventor Daedalus was imprisoned in the Labyrinth he had built for the minotaur. Desperate to escape, he fashioned a set of wings from candle wax and feathers for him and his son, Icarus. Before leaving, he warned Icarus not to fly too close to the sea (for fear the spray would weigh down the wings and cause them to crash) nor too close to the sun, for the heat would melt the wax and cause them to crash. The pair successfully escaped the Labyrinth and the island, but Icarus, caught up in the exhilaration of flight, soared ever higher … until his wings melted and he came crashing down to the sea and drowned.
We were asked to reflect on the moral of the story. “The importance of swimming lessons!” I cracked, “Or, more generally, the importance of always having a backup plan.” Because, of course, Daedalus was worried that his son would fly too high or too low; rather than prepare for that possibility by teaching him how to swim (or fashioning a boat), Daedaelus did the bare minimum and caught the consequences.
Both my explanation and the traditional, “don’t fly too close to the sun” are valid takeaways; this is what I mean when I say that multiple valid narratives can arise from the same set of facts. Were we presenting a report to Daedalus, Inc., on the viability of his new AirWings, I would argue the most useful thing to do would be to present both. Both provide plausible outcomes and actionable information that can be taken away to inform the next stages of the product.
On a more realistic note, I was once asked to do an after-action analysis of a network incursion. In my analysis, I pointed out which IP ranges were generally agreed to be from the same South American country (where there was no legitimate business activity for the targeted company); those access logs seemed to match up with suspicious activity in Florida as well as another South Asian country.
I did not tie those things together. I did not state that they were definitively working together, or even knew of one another. I laid out possibilities including a coordinated attack by the Florida and South American entities (based on timestamps and accounts used); I also posited it was possible the attack originated in South Asia and they passed the compromised credentials to their counterparts (or even sold them to another group) in South America/Florida. It’s also possible that they were all independent actors either getting lucky or acting on the same tip.
The important thing was to not assume facts I did not (and could not) know, and make it very clear when I was extrapolating or assuming facts I did not have. One crucial difference between fairytale and narrative is the acknowledgment of doubt. Do not assert things you cannot know, and point out any caveats or assumptions you made in the formulation of your story. This will not only protect your reputation should any of those facts be wrong, but it makes it easier for others to both conceive of other, additional narratives you might not have, and leaves room / signposts as to what data might be collected in order to verify underlying assumptions.
Summary
It can be easy to get sucked into writing a fairytale when you started out writing a narrative. Data can be hard, deadlines can be short and pressure can be immense. Do you what you can to make sure you’re collecting good data with intent, asking and answering questions that are actually relevant to that data, and not discounting other explanations just because you finished yours. Through the application of proper data analysis, we can get better at providing good products to our customers and treating employees with respect and compassion while still maintaining productivity. It just requires diligence and a willingness to explore beyond superficial numbers to ensure the data you’re analyzing is accurately reflecting reality.
The best thing about this talk is it travels really well: People in Australia were just as annoyed by their companies' decision-making as those in the US.
I'm headed back to the Midwest to do some speakerizing again in August 2024.
Beer City Code 24**** is in Grand Rapids, MI, on Aug 2-3. I'm super excited to present a workshop, Improv for Developers, which is where we'll do actual improv training and then talk about how those skills translate to software development. It's 6 hours (!!), but it should be a lot of fun!
I'll also talk about greenfield development: specifically, that it doesn't really exist anymore. There are always preexisting considerations you're going to have to take into account, so I'll give some hard-won tips on sussing them out.
DevUp will be held in St. Louis on Aug. 14-16. I'll be talking about greenfields again, as well as reasons scrum-based development tends to fail, and how we can measure developer productivity.
Hope to see you this summer!
YES, AND you also have to write documentation or no one will know what the hell you were thinking when you wrote it.
When I gave my talk, "That's not real scrum: Measuring and managing productivity for development teams" at MiTechCon 2024 in Pontiac, MI, there were a number of great questions, both in-person and from the app. I collected them here, as a supplement to the accessible version of the talk.
Q: What are best practices on implementing agile concepts for enterprise technology teams that are not app dev (e.g., DevOps, Cloud, DBA, etc.)?
A brief summary: 1) Define your client (often not the software's end-user; could be another internal group), and 2) find the way to release iteratively to provide them value. This often requires overcoming entrenched models of request/delivery — similar to how development tends to be viewed as a "service provider" who gets handed a list of features to develop, I would imagine a lot of teams trying to make that transition are viewed as providers and expected to just do what they're told. Working back the request cycle with the appropriate "client" to figure out how to deliver incremental/iterative value is how you can deliver successfully with agile!
Q: How do I convince a client who wants stuff at a certain time to trust the agile process?
There's no inherent conflict between a fixed-cost SOW and scrum process. The tension that tends to exist in these situations is not the cost structure, but rather what is promised to be delivered and when. Problems ensue when you're delivering a fixed set of requirements by a certain date - you can certainly do that work in a somewhat agile fashion and gain some of the benefits, but you're ultimately setting yourself up to experience tension as you get feedback through iterations that might ultimately diverge from the original requirements.
This is the "change order hell" that often comes with client work — agile is by definition flexible in its results, so if we try to prescribe them ahead of time, we're setting ourselves up for headaches. That's not to say it's not worth doing (the process may be beneficial to the people doing the work if the waterfall outcome is prescribed), but note (to yourself and the client) that a waterfall outcome (fixed set of features at a fixed date) brings with it waterfall risk, even if you do the work in an agile fashion.
It is unfortunately very often difficult, but this is part of the "organizational shift" I spoke about. If the sales team does not sell based on agile output, it's very difficult to perform proper agile development in order the reap all its benefits.
Q: We're using Agile well; How do we dissuade skip-level leadership from demanding waterfall delivery dates using agile processes?
This is very similar to the previous answer, with the caveat that it's not on you to convince a level of leadership beyond your own manager of anything. You can and should be providing your manager with the information and advice mentioned in the above answer, but ultimately that convincing has to come from the people they manage, not levels removed. Scrum (and agile, generally) requires buy-in up and down the corporate stack.
Q: What are best practices for ownership of the product backlog?
Best practices are contextual! Ownership of the product backlog is such a tricky question.
In general, I think product backlogs tend to have too many items. I am very much a fan of expiring backlog items — if they haven't been worked on in 30 days (two-ish sprints), they go away (system-enforced!) until the problem they address comes up again.
The product owner is accountable for the priority and what's included or removed from the product backlog.
I kind of think teams should have two separate stores of stories: One is the backlog, specific ideas or stories that are going to be worked on (as above) in the next sprint or two), which is the product owner's responsibility. The second is a brainstorming pool — preferably not even in the same system (because it is NOT the case that you should be just be plucking from the pool and plopping on the backlog). Rather, these are just broad ideas or needs we want to capture so we don't lose sight of them, but from them, specific problems are identified and stories written. This should be curated by the product owner, but allow for easier/broader access to add to it.
Q: Is it ever recommended to have the Scrum Master also be Product Manager?
(I am assuming for the sake of this question that Product Manager = Product Owner. If I am mistaken, apologies!)
I would generally not**** recommend the product owner and the scrum master be the same person, though I am aware by necessity it sometimes happens. It takes a lot of varied skills to do both of those jobs, and in most cases if it happens successfully it's because there's a separate system in place to compensate in one or both areas. (e.g., there's a separate engineering manager who's picking up a lot of what would generally be SM work, or the product owner is in name only because someone else/external is doing the requirements- gathering/customer interaction). Both positions require a TON of work to perform properly - direct customer interaction, focus groups, metrics analysis and stakeholder interaction are just some of a PM's duties, while the SM should be devoted to the dev team to make sure any blocks get cleared and work continues apace.
But even more than time, there's a philosophical divide that would be difficult to resolve in one person. The SM should be looking at things from a perspective of what's possible now, whereas the PM should have a longer-term view of what should be happening soon. Rare is the individual who can hold both of those things in their head with equal weight; usually one is going to be prioritized over the other, to the detriment of the process overall.
Q: What is the best (highest paying) Scrum certification?
If your pay is directly correlated with the specific certification you have, you are very likely working for the company that provides it. Specific certifications may be more favored in certain industries or verticals, but that's no more than generally indicative of pay than the difference between any two different companies.
More broadly, I view certifications as proof of knowledge that should be useful and transferable regardless of specific situation. Much like Agile, delivering value (and a track record of doing same) is the best route to long-term career success (and hence more money).
Q: Can you use an agile scrum approach without a central staffing resource database?
Yes, with a but! You do not need a formal method of tracking your resourcing, but the scrum master (at the team level) needs to know their resourcing (in terms of how many developers are going to be available to work that sprint) in order to properly plan the sprint. If someone is taking a vacation, you need to either a) pull in fewer stories, b) increase your sprint length, or c) pull in additional resources (if availble to you).
Even at the story level, this matters. If you have a backend ticket and your one BE developer is out, you're not gonna want to put that in the sprint. But it doesn't need to be a formal, centralized database. It could be as simple as everyone noting their PTO during sprint planning.
What's always both heartening and a little bit sad to me is how much the scrum teams want to produce good products, provide value, and it's over-management that holds them back from doing so.
Today I want to talk about data transfer objects, a software pattern you can use to keep your code better structured and metaphorically coherent.
I’ll define those terms a little better, but first I want to start with a conceptual analogy.
It is a simple truth that, no matter whether you focus on the frontend, the backend or the whole stack, everyone hates CSS.
I kid, but also, I don’t.
CSS is probably among the most reviled of technologies we have to use all the time. The syntax and structure of CSS seems almost intentionally designed to make it difficult to translate from concept to “code,” even simple things. Ask anyone who’s tried to center a div.
And there are all sorts of good historical reasons why CSS is the way it is, but most developers find it extremely frustrating to work with. It’s why we have libraries and frameworks like Tailwind. And Bulma. And Bootstrap. And Material. And all the other tools we use that try their hardest to make sure you never have to write actual while still reaping the benefits of a presentation layer separate from content.
And we welcome these tools, because it means you don’t need to understand the vagaries of CSS in order to get what you want. It’s about developer experience, making it easier on developers to translate their ideas into code.
And in the same way we have tools that cajole CSS into giving us what we want, I want to talk about a pattern that allows you to not worry about anything other than your end goal when you’re building out the internals of your application. It’s a tool that can help you stay in the logical flow of your application, making it easier to puzzle through and communicate about the code you’re writing, both to yourself and others. I’m talking about DTOs.
DTOs
So what is a DTO? Very simply, a data transfer object is a pure, structured data object - that is, an object with properties but no methods. The entire point of the DTO is to make sure that you’re only sending or receiving exactly the data you need to accomplish a given function or task - no more, no less. And you can be assured that your data is exactly the right shape, because it adheres to a specific schema.
And as the “transfer” part of the name implies, a DTO is most useful when you’re transferring data between two points. The title refers to one of the more common exchanges, when you’re sending data between front- and back-end nodes, but there are lots of other scenarios where DTOs come in handy.
Sending just the right amount of data between modules within your application, or consuming data from different sources that use different schemas, are just some of those.
I will note there is literature that suggests the person who coined the term, Martin Fowler, believes that you should not have DTOs except when making remote calls. He’s entitled to his opinion (of which he has many), but I like to reuse concepts where appropriate for consistency and maintainability.
The DTO is one of my go-to patterns, and I regularly implement it for both internal and external use. I’m also aware most people already know what pure data objects are. I’m not pretending we’re inventing the wheel here - the value comes in how they’re applied, systematically.
Advantages
-
For DTOs are a systematic approach to managing how your data flows through and between different parts of your application as well as external data stores.
-
Properly and consistently applied, DTOs can help you maintain what I call metaphorical coherence in your app. This is the idea that the names of objects in your code are the same names exposed on the user-facing side of your application.
Most often, this comes up when we’re discussing domain language - that is, your subject-matter-specific terms (or jargon, as the case may be).
I can’t tell you the number of times I’ve had to actively work out whether a class with the name of “post” refers to a blog entry, or the action of publishing an entry, or a location where someone is stationed. Or whether “class” refers to a template for object creation, a group of children, or one’s social credibility. DTOs can help you keep things organized in your head, and establish a common vernacular between engineering and sales and support and even end-users.
It may not seem like much, but that level of clarity makes talking and reasoning about your application so much easier because you don’t have to jump through mental hoops to understand the specific concept you’re trying to reference.
-
DTOs also help increase type clarity. If you’re at a shop that writes Typescript with “any” as the type for everything, you have my sympathies, and also stop it. DTOs might be the tool you can wield to get your project to start to use proper typing, because you can define exactly what data’s coming into your application, as well as morphing it into whatever shape you need it to be on the other end.
-
Finally, DTOs can help you keep your code as modular as possible by narrowing down the data each section needs to work with. By avoiding tight coupling, we can both minimize side effects and better set up the code for potential reuse.
And, as a bonus mix of points two and four, when you integrated with an external source, DTOs can help you maintain your internal metaphors while still taking advantage of code or data external to your system.
To finish off our quick definition of terms, a reminder that PascalCase is where all the words are jammed together with the first letter of each word capitalized; camelCase is the same except the very first letter is lowercase; and snake case is all lowercase letters joined by underscores.
This is important for our first example.
Use-case 1: FE/BE naming conflicts
The first real-world use-case we’ll look at is what was printed on the box when you bought this talk. That is, when your backend and frontend don’t speak the same language, and have different customs they expect the other to adhere to.
Trying to jam them together is about as effective as when an American has trouble ordering food at a restaurant in Paris and compensates by yelling louder.
In this example, we have a PHP backend talking to a Typescript frontend.
I apologize for those who don’t know one or both languages. For what it’s worth, we’ll try to keep the code as simple as possible to follow, with little-to-no language-specific knowledge required. In good news, DTOs are entirely language agnostic, as we’ll see as we go along.
Backend
class User { public function __construct( public int $id, public string $full_name, public string $email_address, public string $avatar_url ){} }
Per PSR-12, which is the coding standard for PHP, class cases must be in PascalCase, method names must be implemented in camelCase. However, the guide “intentionally avoids any recommendation” as to styling for property names, instead just choosing “consistency.”
Very useful for a style guide!
As you can see, the project we’re working with uses snake case for its property names, to be consistent with its database structure.
Frontend
`class User { userId: number; fullName: string; emailAddress: string; avatarImageUrl: string;
load: (userId: number) => {/* load from DB /}; save: () => {/ persist */}; }`
But Typescript (for the most part, there’s not really an “official” style guide in the same manner but most your Google, your Microsofts, your Facebooks tend to agree) that you should be using camelCase for your variable names.
I realize this may sound nit-picky or like small potatoes to those of used to working as solo devs or on smaller teams, but as organizations scales up consistency and parallelism in your code is vital to making sure both that your code and data have good interoperability, as well as ensuring devs can be moved around without losing significant chunks of time simply to reteach themselves style.
Now, you can just choose one of those naming schemes to be consistent across the frontend and backend, and outright ignore one of the style standards.
Because now your project asking one set of your developers to context-switch specifically for this application. It also makes your code harder to share (unless you adopt this convention-breaking in your extended cinematic code universe). You’ve also probably killed a big rule in your linter, which you now have to customize in all implementations.
OR, we can just use DTOs.
Now, I don’t have a generic preference whether the DTO is implemented on the front- or the back-end — that determination has more to do with your architecture and organizational structure than anything else.
Who owns the contracts in your backend/frontend exchange is probably going to be the biggest determiner - whichever side controls it, the other is probably writing the DTO. Though if you’re consuming an external data source, you’re going to be writing that DTO on the frontend.
Where possible, I prefer to send the cleanest, least amount of data required from my backend, so for our first example we’ll start there. Because we’re writing the DTO in the backend, the data we send needs to conform to the schema the frontend expects - in this instance, Typescript’s camel case.
Backend
class UserDTO { public function __construct( public int $userId, public string $fullName, public string $emailAddress, public string $avatarImageUrl ) {} }
That was easy, right? We just create a data object that uses the naming conventions we’re adopting for sharing data. But of course, we have to get our User model into the DTO. This brings me to the second aspect of DTOs, the secret sauce - the translators.
Translators
function user_to_user_dto(User $user): UserDTO { return new UserDTO( $user->id, $user->full_name, $user->email_address, $user->avatar_url ); }
Very simply, a translator is the function (and it should be no more than one function per level of DTO) that takes your original, nonstandard data and jams it into the DTO format.
Translators get called (and DTOs are created) at points of ingress and egress. Whether that’s internal or external, the point at which a data exchange is made is when a translator is run and a DTO appears – which side of the exchange is up to your implementation. You may also, as the next example shows, just want to include the translator as part of the DTO.
Using a static create method allows us to keep everything nice and contained, with a single call to the class.
`class UserDTO { public function __construct( public int $userId, public string $fullName, public string $emailAddress, public string $avatarImageUrl ) {}
public static function from_user(User $user): UserDTO { return new self( $user->id, $user->full_name, $user->email_address, $user->avatar_url ); } }
$userDto = UserDTO::from_user($user);`
I should note we’re using extremely simplistic base models in these examples. Often, something as essential as the user model is going to have a number of different methods and properties that should never get exposed to the frontend.
While you could do all of this through customizing the serialization method for your object. I would consider that to be a distinction in implementation rather than strategy.
An additional benefit of going the separate DTO route is you now have an explicitly defined model for what the frontend should expect. Now, your FE/BE contract testing can use the definition rather than exposing or digging out the results of your serialization method.
So that’s a basic backend DTO - great for when you control the data that’s being exposed to one or potentially multiple clients, using a different data schema.
Please bear with me - I know this probably seems simplistic, but we’re about to get into the really useful stuff. We gotta lay the groundwork first.
Frontend
Let’s back up and talk about another case - when you don’t control the backend. Now, we need to write the DTO on the frontend.
First we have our original frontend user model.
`class User { userId: number; fullName: string; emailAddress: string; avatarImageUrl: string;
load: (userId: number) => {/* load from DB /}; save: () => {/ persist */}; }`
Here is the data we get from the backend, which I classify as a Response, for organizational purposes. This is to differentiate it from a Payload, which data you send to the API (which we’ll get into those later).
interface UserResponse { id: number; full_name: string; email_address: string; avatar_url: string; }
You’ll note, again, because we don’t control the structure used by the backend, this response uses snake case.
So we need to define our DTO, and then translate from the response.
Translators
You’ll notice the DTO looks basically the same as when we did it on the backend.
interface UserDTO { userId: number; fullName: string; emailAddress: string; avatarImageUrl: string; }
But it's in the translator you can now see some of the extra utility this pattern offers.
const translateUserResponseToUserDTO = (response: UserResponse): UserDTO => ({ userId: response.id, fullName: response.full_name, emailAddress: response.email_address, avatarImageUrl: response.avatar_url });
When we translate the response, we can change the names of the parameters before they ever enter the frontend system. This allows us to maintain our metaphorical coherence within the application, and shield our frontend developers from old/bad/outdated/legacy code on the backend.
Another nice thing about using DTOs in the frontend, regardless of where they come from, is they provide us with a narrow data object we can use to pass to other areas of the application that don’t need to care about the methods of our user object.
DTOs work great in these cases because they allow you to remove the possibility of other modules causing unintended consequences.
Notice that while the User object has load and save methods, our DTO just has the properties. Any modules we pass our data object are literally incapable of propagating manipulations they might make, inadvertently or otherwise. Can’t make a save call if the object doesn’t have a save method.
Use-case 2: Metaphorically incompatible systems
For our second use-case, let’s talk real-world implementation. In this scenario, we want to join up two systems that, metaphorically, do not understand one another.
Magazine publisher
-
Has custom backend system (magazines)
-
Wants to explore new segment (books)
-
Doesn’t want to build a whole new system
I worked with a client, let’s say they’re a magazine publisher. Magazines are a dying art, you understand, so they want to test the waters of publishing books.
But you can’t just build a whole new app and infrastructure for an untested new business model. Their custom backend system was set up to store data for magazines, but they wanted to explore the world of novels. I was asked them build out that Minimum Viable Product.
Existing structure
`interface Author { name: string; bio: string; }
interface Article { title: string; author: Author; content: string; }
interface MagazineIssue { title: string; issueNo: number; month: number; year: number; articles: Article[]; }`
This is the structure of the data expected by both the existing front- and back-ends. Because everything’s one word, we don’t even need to worry about incompatible casing.
Naive implementation This new product requires performing a complete overhaul of the metaphor.
`interface Author { name: string; bio: string; }
interface Chapter { title: string; author: Author; content: string; }
interface Book { title: string; issueNo: number; month: number; year: number; articles: Chapter[]; }`
But we are necessarily limited by the backend structure as to how we can persist data.
If we just try to use the existing system as-is, but change the name of the interfaces, it’s going to present a huge mental overhead challenge for everyone in the product stack.
As a developer, you have to remember how all of these structures map together. Each chapter needs to have an author, because that’s the only place we have to store that data. Every book needs to have a month, and a number. But no authors - only chapters have authors.
So we could just use the data structures of the backend and remember what everything maps to. But that’s just asking for trouble down the road, especially when it comes time to onboard new developers. Now, instead of them just learning the system they’re working on, they essentially have to learn the old system as well.
Plus, if (as is certainly the goal) the transition is successful, now their frontend is written in the wrong metaphor, because it’s the wrong domain entirely. When the new backend gets written, we’re going to have to the exact same problem in the opposite direction.
I do want to take a moment to address what is probably obvious – yes, the correct decision would be to build out a small backend that can handle this, but I trust you’ll all believe me when I say that sometimes decisions get made for reasons other than “what makes the most sense for the application’s health or development team’s morale.”
And while you might think that find-and-replace (or IDE-assisted refactoring) will allow you to skirt this issue, please trust me that you’re going to catch 80-90% of cases and spend twice as much time fixing the rest as it would have to write the DTOs in the first place.
Plus, as in this case, your hierarchies don’t always match up properly.
What we ended up building was a DTO-based structure that allowed us to keep metaphorical coherence with books but still use the magazine schema.
Proper implementation
You’ll notice that while our DTO uses the same basic structures (Author, Parts of Work [chapter or article], Work as a Whole [book or magazine]), our hierarchies diverge. Whereas Books have one author, Magazines have none; only Articles do.
The author object is identical from response to DTO.
You’ll also notice we completely ignore properties we don’t care about in our system, like IssueNo.
How do we do this? Translators!
Translating the response
We pass the MagazineResponse in to the BookDTO translator, which then calls the Chapter and Author DTO translators as necessary.
`export const translateMagazineResponseToAnthologyBookDTO = (response: MagazineResponse): AnthologyBookDTO => { const chapters = (response.articles.length > 0) ? response.articles.forEach((article) => translateArticleResponseToChapterDTO(article)) : []; const authors = [ ...new Set( chapters .filter((chapter) => chapter.author) .map((chapter) => chapter.author) ) ]; return {title: response.title, chapters, authors}; };
export const translateArticleResponseToChapterDTO = (response: ArticleResponse): ChapterDTO => ({ title: response.title, content: response.content, author: response.author });`
This is also the first time we’re using one of the really neat features of translators, which is the application of logic. Our first use is really basic, just checking if the Articles response is empty so we don’t try to run our translator against null. This is especially useful if your backend has optional properties, as using logic will be necessary to properly model your data.
But logic can also be used to (wait for it) transform your data when we need to.
Remember, in the magazine metaphor, articles have authors but magazine issues don’t. So when we’re storing book data, we’re going to use their schema by grabbing the author of the first article, if it exists, and assign it as the book’s author. Then, our chapters ignore the author entirely, because it’s not relevant in our domain of fiction books with a single author.
Because the author response is the same as the DTO, we don’t need a translation function. But we do have proper typing so that if either of them changes in the future, it should throw an error and we’ll know we have to go back and add a translation function.
The payload
Of course, this doesn’t do us any good unless we can persist the data to our backend. That’s where our payload translators come in - think of Payloads as DTOs for the anything external to the application.
`interface AuthorPayload name: string; bio: string; }
interface ArticlePayload { title: string; author: Author; content: string; }
interface MagazineIssuePayload { title: string; issueNo: number; month: number; year: number; articles: ArticlePayload[]; }`
For simplicity’s sake we’ll assume our payload structure is the same as our response structure. In the real world, you’d likely have some differences, but even if you don’t it’s important to keep them as separate types. No one wants to prematurely optimize, but keeping the response and payload types separate means a change to one of them will throw a type error if they’re no longer parallel, which you might not notice with a single type.
Translating the payload
`export const translateBookDTOToMagazinePayload = (book: BookDTO): MagazinePayload => ({ title: book.title, articles: (book.chapters.length > 0) ? book.chapters.forEach((chapter) => translateChapterDTOToArticlePayload(chapter, book) : [], issueNo: 0, month: 0, year: 0, });
export const translateChapterDTOToArticlePayload = (chapter: ChapterDTO, book: BookDTO): ArticlePayload => ({ title: chapter.title, author: book.author, content: chapter.content });`
Our translators can be flexible (because we’re the ones writing them), allowing us to pass objects up and down the stack as needed in order to supply the proper data.
Note that we’re just applying the author to every article, because a) there’s no harm in doing so, and b) the system like expects there be an author associated with every article, so we provide one. When we pull it into the frontend, though, we only care about the first article.
We also make sure to fill out the rest of the data structure we don’t care about so the backend accepts our request. There may be actual checks on those numbers, so we might have to use more realistic data, but since we don’t use it in our process, it’s just a question of specific implementation.
So, through the application of ingress and egress translators, we can successfully keep our metaphorical coherence on our frontend while persisting data properly to a backend not configured to the task. All while maintaining type safety. That’s pretty cool.
The single biggest thing I want to impart from this is the flexibility that DTOs offer us.
Use-case 3: Using the smallest amount of data required
When working with legacy systems, I often run into a mismatch of what the frontend expects and what the backend provides; typically, this results in the frontend being flooded an overabundance of data.
These huge data objects wind up getting passed around and used on the frontend because, for example, that’s what represents the user, even if you only need a few properties for any given use-case.
Or, conversely, we have the tiny amount of data we want to change, but the interface is set up expecting the entirety of the gigantic user object. So we wind up creating a big blob of nonsense data, complete with a bunch of null properties and only the specific ones we need filled in. It’s cumbersome and, worse, has to be maintained so that whenever any changes to the user model need to be propagated to your garbage ball, even if those changes don’t touch the data points you care about.
One way to eliminate the data blob is to use DTOs to narrowly define which data points a component or class needs in order to function. This is what I call minimizing touchpoints, referring to places in the codebase that need to be modified when the data structure changes.
In this scenario, we’re building a basic app and we want to display an avatar for a user. We need their name, a picture and a color for their frame.
const george = { id: 303; username: 'georgehernandez'; groups: ['users', 'editor'], sites: ['https://site1.com'], imageLocation: '/assets/uploads/users/gh-133133.jpg'; profile: { firstName: 'George'; lastName: 'Hernandez'; address1: '738 Evergreen Terrace'; address2: ''; city: 'Springfield'; state: 'AX'; country: 'USA'; favoriteColor: '#1a325e'; } }
What we have is their user object, which contains a profile and groups and sites the user is assigned to, in addition to their address and other various info.
Quite obviously, this is a lot more data than we really need - all we care about are three data points.
`class Avatar { private imageUrl: string; private hexColor: string; private name: string;
constructor(user: User) { this.hexColor = user.profile.favoriteColor: this.name = user.profile.firstName
- ' '
- user.profile.lastName;
this.imageUrl = user.imageLocation;
}
}`
This Avatar class works, technically speaking, but if I’m creating a fake user (say it’s a dating app and we need to make it look like more people are using than actually is the case), I now have to create a bunch of noise to accomplish my goal.
const lucy = { id: 0; username: ''; groups: []; sites: []; profile: { firstName: 'Lucy'; lastName: 'Evans'; address1: ''; address2: ''; city: ''; state: ''; country: ''; } favoriteColor: '#027D01' }
Even if I’m calling from a completely separate database and class, in order to instantiate an avatar I still need to provide the stubs for the User class.
Or we can use DTOs.
`class Avatar { private imageUrl: string; private hexColor: string; private name: string;
constructor(dto: AvatarDTO) { this.hexColor = dto.hexColor: this.name = dto.name; this.imageUrl = dto.imageUrl; } }
interface AvatarDTO { imageUrl: string; hexColor: string; name: string; }
const translateUserToAvatarDTO = (user: User): AvatarDTO => ({ name: [user.profile.firstName, user.profile.lastName].join(' '), imageUrl: user.imageLocation, hexColor: user.profile.favoriteColor });`
By now, the code should look pretty familiar to you. This pattern is really not that difficult once you start to use it - and, I’ll wager, a lot of you are already using it, just not overtly or systematically. The bonus to doing it in a thorough fashion is that refactoring becomes much easier - if the frontend or the backend changes, we have a single point from where the changes emanate, making them much easier to keep track of.
Flexibility
But there’s also flexibility. I got some pushback from implementing the AvatarDTO; after all, there were a bunch of cases already extant where people were passing the user profile, and they didn’t want to go find them. As much as I love clean data, I am a consultant; to assuage them, I modified the code so as to not require extra work (at least, at this juncture).
`class Avatar { private avatarData: AvatarDTO;
constructor(user: User|null, dto?: AvatarDTO) { if (user) { this.avatarData = translateUserToAvatarDTO(user); } else if (dto) { this.avatarData = dto; } } }
new Avatar(george); new Avatar(null, { name: 'Lucy Evans', imageUrl: '/assets/uploads/users/le-319391.jpg', hexColor: '#fc0006' });`
Instead of requiring the AvatarDTO, we still accept the user as the default argument, but you can also pass it null. That way I can pass my avatar DTO where I want to use it, but we take care of the conversion for them where the existing user data is passed in.
Use-case 4: Security
The last use-case I want to talk about is security. I assume some to most of you already get where I’m going with this, but DTOs can provide you with a rock-solid way to ensure you’re only sending data you’re intending to.
Somewhat in the news this month is the Spoutible API breach; if you’ve never heard of it, I’m not surprised. Spoutible a Twitter competitor, notable mostly for its appalling approach to API security.
I do encourage all of you to look this article up on troyhunt.com, as the specifics of what they were exposing are literally unbelievable.
{ err_code: 0, status: 200, user: { id: 23333, username: "badwebsite", fname: "Brad", lname: "Website", about: "The collector of bad website security data", email: 'fake@account.com', ip_address: '0.0.0.0', verified_phone: '333-331-1233', gender: 'X', password: '$2y$10$r1/t9ckASGIXtRDeHPrH/e5bz5YIFabGAVpWYwIYDCsbmpxDZudYG' } }
But for the sake of not spoiling all the good parts, I’ll just show you the first horrifying section of data. For authenticated users, the API appeared to be returning the entire user model - mundane stuff like id, username, a short user description, but also the password hash, verified phone number and gender.
Now, I hope it goes without saying that you should never be sending anything related to user passwords, whether plaintext or hash, from the server to the client. It’s very apparent when Spoutible was building its API that they didn’t consider what data was being returned for requests, merely that the data needed to do whatever task was required. So they were just returning the whole model.
If only they’d used DTOs! I’m not going to dig into the nitty-gritty of what it should have looked like, but I think you can imagine a much more secure response that could have been sent back to the client.
Summing up
If you get in the practice of building DTOs, it’s much easier to keep control of precisely what data is being sent. DTOs not only help keep things uniform and unsurprising on the frontend, they can also help you avoid nasty backend surprises as well.
To sum up our little chat today: DTOs are a great pattern to make sure you’re maintaining structured data as it passes between endpoints.
Different components only have to worry about exactly the data they need, which helps both decrease unintended consequences and decrease the amount of touchpoints in your code you need to deal with when your data structure changes. This, in turn, will help you maintain modular independence for your own code.
It also allows you to confidently write your frontend code in a metaphorically coherent fashion, making it easier to communicate and reason about.
And, you only need to conform your data structure to the backend’s requirements at the points of ingress and egress - Leaving you free to only concern your frontend code with your frontend requirements. You don’t have to be limited by the rigid confines backend’s data schema.
Finally, the regular use of DTOs can help put you in the mindset of vigilance in regard to what data you’re passing between services, without needing to worry that you’re exposing sensitive data due to the careless conjoining of model to API controller.
🎵 I got a DTOooooo
Honestly, I thought we were past this as an industry? But my experience at Developer Week 2024 showed me there's still a long way to go to overcoming sexism in tech.
And it came from the source I least expected; literally people who were at the conference trying to convince others to buy their product. People for whom connecting and educating is literally their job.
Time and again, both I (an engineer) and my nonbinary wife (a business analyst, at a different organization) found that the majority of the masculine-presenting folks at the booths on the expo floor were dismissive and disinterested, and usually patronizing.
It was especially ironic given one of the predominant themes of the conference was developer experience, and focusing on developer productivity. One of the key tenets of dx is listening to what developers have to say. These orgs failed. Horribly.
My wife even got asked when "your husband" would be stopping by.
I had thought it would go without saying, but female- and androgynous-presenting folk are both decision-makers in their own right as well as people with influence in companies large and small.
To organizations: Continuing to hire people who make sexist (and, frankly, stupid) judgments about who is worth talking to and what you should be conversing with them about is not only insulting, it's bad business. Founders: If you're the ones at your booth, educate yourselves. Fast.
I can tell you there are at least three different vendors who were providing services in areas we have professional needs around who absolutely will not get any consideration, simply because we don't want to deal with people like that. I don't assume the whole organization holds the same opinions as their representative; however, I can tell for a fact that such views are not disqualifying by that organization, and so I have no interest in dealing with them further.
Rather than call out the shitty orgs, I instead want to call out the orgs whose reps were engaging, knowledgable and overall pleasant to deal with. Both because those who do it right should be celebrated, and because in an attention economy any given (even negative) is unfortunately an overall net positive.
The guys at Convex**** were great, answering all my questions about their seemingly very solid and robust Typescript backend platform.
The folks at Umbraco**** gave great conference talks, plied attendees with cookies and talked with us at length about their platform and how we might use it. Even though I dislike dotNet, we are very interested in looking at them for our next CMS project.
The developer advocates at Couchbase**** were lovely and engaging, even if I disagree with Couchbase's overall stance on their ability to implement ACID.
The folks at the Incident.io**** booth were wonderful, and a good template for orgs trying to sell services: They brought along an engineering manager who uses their services, and could speak specifically to how to use them.
I want to give a shout-out to those folks, and to exhort organizations to do better in training those you put out as the voice of your brand. This is not hard. And it only benefits you to do so.
Also, the sheer number of static code analysis companies makes me thinks there's a consolidation incoming. Not a single one of three could differentiate their offerings on more than name and price.
OK, so it's not exactly "new" anymore, but this is the accessibility talk I gave at Longhorn PHP in Nov. 2023. And let's be honest, it's still new to 99% of you. My favorite piece of feedback I got was, "I know it's about 'updates,' but you could have provided an overview of the most common accessibility practices." Bruh, it's a 45-minute talk, not a 4-hour workshop.
I'll be hitting the lecture circuit again this year, with three conferences planned for the first of 2024.
In February, I'll be at Developer Week in Oakland (and online!), talking about Data Transfer Objects.
In March, I'll be in Michigan for the Michigan Technology Conference, speaking about clean code as well as measuring and managing productivity for dev teams.
And in April I'll be in Chicago at php[tek] to talk about laws/regulations for developers and DTOs (again).
Hope to see you there!
Who holds a conference in the upper Midwest in March???
Hey everybody, in case you wanted to see my face in person, I will be speaking at LonghornPHP, which is in Austin from Nov. 2-4. I've got two three things to say there! That's twice thrice as many things as one thing! (I added a last-minute accessibility update).
In case you missed it, I said stuff earlier this year at SparkConf in Chicago!
I said stuff about regulations (HIPAA, FERPA, GDPR, all the good ones) at the beginning of this year. This one is available online, because it was only ever available online:
I am sorry for talking so fast in that one, I definitely tried to cover more than I should have. Oops!
The SparkConf talks are unfortunately not online yet (for reasons), and I'm doubtful they ever will be.