Kait

Beyond “clean” code: Focusing on practical (vs. aesthetic) aspects

I grew up on Clean Code, both the book and the concept. I strove for my code to be “clean,” and it was the standard against which I measured myself.

And I don’t think I was alone! Many of the programmers I’ve gotten to know over the years took a similar trajectory, venerating CC along with Code Complete and Pragmatic Programmer as the books everyone should read.

But along the way, “clean” started to take on a new meaning. It’s not just from the context of code, either; whether in interior design or architecture or print design, “clean” started to arise as a synonym for “minimalism.”

This was brought home to me when I was working with a junior developer a couple years ago. I refactored a component related to one we working on together to enable necessary functionality, and I was showing him the changes. This was a 200-line component, and he skimmed it about 45 seconds before saying “Nice, much cleaner.”

And it bugged me, but I wasn’t sure why. He was correct - it was cleaner, but it felt like that shouldn’t have been something he was accurately able to identify simply by glancing at it. Or at least, if that was the metric he was using, “clean” wasn’t cutting it.

Because the fact of the matter is you can’t judge the quality of code without reading it and understanding what it’s trying to do, especially without considering it in the context of its larger codebase. You can find signifiers (e.g., fewer lines of code, fewer methods in a class), but “terse” is not a direct synonym of “clean.” Sometimes less code is harder to understand or maintain than more code.

I wanted to find an approach, a rubric, that allowed for more specificity. When I get feedback, I much prefer hearing the specific aspects that are being praised or need work on - someone telling me “that code’s clean” or not isn’t particularly actionable.

So now I say code should be Comprehensible, Predictable and Maintainable. I liked those three elements because they’re important on their own, but also each builds on the others. You cannot have predictable and maintainable code unless it’s also comprehensible, for example.

  • Comprehensible - People other than the author, at the time the code is written, can understand both what the code is doing and why.

  • Predictable - If we look at one part of the code (a method, a class, a module), we should be able to infer a number of properties about the rest.

  • Maintainable - Easy to modify and keep up, as code runs forever

Comprehensibility is important because we don’t all share the context - even if you’re the only person who’s ever going to read the code, the you of three weeks from now will have an entirely different set of issues you’re focusing on, and will not bring the same thoughts to bear when reasoning about the code. And, especially in a professional context, rare is the code that’s only ever read by one other person.

Predictability speaks to cohesion and replicability across your codebase. If I have a method load on a model responsible for pulling that object’s information from the database, all the other models should use load when pulling object info from the DB. Even though you could use get or loadFromDb or any number of terms that are still technically comprehensible, the predictability of using the same word to mean the same thing reduces overall cognitive load when reasoning about the application. If I have to keep track of which word means the action I’m trying to take based on which specific model I’m using, that’s a layer of mental overhead that’s doing nothing toward actually increasing the value or functionality of the software.

Maintainability is the sort of an extension of comprehensibility - how easy is it the code to change or fix down the road? Maintainability includes things like the “open to extension, closed to modification” principle from SOLID, but also things like comments (which we’ll get to, specifically, later on). Comprehensibility is focused on the “what” the code is doing, which often requires in-code context and clear naming. Maintainability on the other hand, focuses on the “why” - so that, if I need to modify it later on, I know what the intent of the method/class/variable was, and can adjust accordingly.

The single most important aspect of CPM code is naming things. Naming stuff right is hard. How we name things influences how we reason about them, how we classify them, and how others will perceive them. Because those names eventually evolve to carry meaning on their own, which can be influenced by outside contexts, and that whole messy ball of definition is what the next person is going to be using when they think about the thing.

I do believe most programmers intellectually know the importance of naming things, but it’s never given the proper level of respect and care its importance would suggest. Very rarely do I see code reviews that suggest renaming variables or methods to enhance clarity - basically, the rule is if it’s good enough that the reviewer understands it at that moment, that’s fine. I don’t think it is.

A class called User should contain all the methods related to the User model. This seems like an uncontroversial stance. But you have to consider that model in the context of its overall codebase. If there is (and there should be) also a class called Authorization in that codebase,  there are already inferences we should be able to draw simply from the names of those two things.

We should assume User and Authorization are closely related; I would assume that some method in Authorization is going to be responsible for verifying that the user of the application is a User allowed to access parts of the application. I would assume these classes are fairly tightly coupled in some respects, and it would be difficult to use one without the other, in some respect.

Names provide signposts and architecture hints of the broader application, and the more attuned to them you are (as both a writer and reader of code), the more information will be able to be conveyed simply by paying attention to them.

If naming is the single most important aspect of CPM, the single most important aspect of naming things is consistency. I personally don’t care about most styling arguments (camelCase vs. snake_case, tabs vs. spaces, whatever). If there’s a style guide for your language or framework, my opinion is you should follow it as closely as possible, deviating only if there’s an actual significant benefit to doing so.

Following style conventions has the two advantages: allowing for easier interoperability of code from different sources, and enabling the use of linters and formatters.

Code is easier to share (both out to others and in from others) if they use same naming conventions and styles, because you’re not adding an extra layer of reasoning atop the code. If you have to remember that Library A uses camelCase for methods but Framework B uses snake_case, that’s however large a section of your brain that is focusing on something other than the logic of what the code is doing.

And enabling linters and formatters means there’s a whole section of code maintenance you no longer have to worry about - you can offload that work to the machine. Remember, computers exist to help us solve problems and offload processing. A deterministic set of rules that can be applied consistently is literally the class of problems computers are designed to handle.

Very broadly, my approach to subjective questions is: Be consistent. Anything that doesn’t directly impact comprehensibility is a subjective decision. Make a decision, set your linter or formatter, and never give another thought to it. Again, consistency is the most important aspect of naming.

But a critically under-appreciated aspect of naming is the context of the author. Everyone sort of assumes we all share the same context, in lots of ways. “Because we work on the same team/at the same company, the next developer will know the meaning of the class PayGrimples.” That may be very broadly true, in that they’ve probably heard of PayGrimples, but it doesn’t mean they share the same context.

A pop-culture example of this is pretty easy - think of the greatest spaceship pilot in the universe, one James Tiberius Kirk. Think about all his exploits, all the strange new worlds he’s discovered. Get a good picture of him in your head.

Which one did you pick? Was it The Original Series’ William Shatner? The new movies’ Chris Pine? Or was it Strange New Worlds’ Paul Wesley?

You weren’t wrong in whatever you picked. Any of those is a valid and correct answer. But if we were talking about Kirk in conversation, you likely would have asked to clarify which one I meant. If we hadn’t, we could talk about two entirely different versions of the same concept indefinitely until we hit upon a divergence point when one of us realized.

Code has that same issue, except whoever’s reading it can’t ask for that clarification. And they can only find out they’re thinking about a different version of the concept if they a) read and digest the code in its entirety before working on it, or b) introduce or uncover a bug in the course of changing it. So when we name things, we should strive for the utmost clarity.

⛔ Unclear without context

type User = {
	id: number;
	username: string;
	firstName: string;
	lastName: string;
	isActive: boolean;
}

The above is a very basic user model, most of whose properties are clear enough. Id, username, firstName and lastName are all pretty self-explanatory. But then we get to the boolean isActive.

This could mean any number of things in context. They include, but are not limited to:

  • The user is moving their mouse on the screen right now

  • The user has a logged-in session

  • The user has an active subscription

  • The user has logged in within the last 24 hours

  • The user has performed an authenticated activity in the last 24 hours

  • The user has logged in within the last 60 days

All of those are things we may want to know about the user of any application, depending on what we’re trying to do. Even similar-sounding events with the same time horizon (logged in within the last 24 hours vs. authenticated activity in the last 24 hours) give us different information - I can infer the maximum age of the authentication token in the logged-in case, but without knowing the token exchange process, I cannot make the same inference for authenticated activity.

So why not just provide the meaning with the name?

✅ Clarity without context

type User = {
	id: number;
	username: string;
	firstName: string;
	lastName: string;
	loggedInPrevious24Hours: boolean;
}

Clarity comes through naming things explicitly. Ambiguity is the enemy of clarity, even when you assume the person reading the code should know something.

It’s reasonable to assume that the people reading your code are developers - that is, people familiar with coding concepts. Every other context (industry/domain, organization) is not a safe assumption. Therefore, if you have names or terms that are also used in coding, you should clarify the other meaning. (You should do this generally, as well, but specifically with programming-related terms.)

⛔ Ambiguity kills comprehension

class Class {}
class Post {} 

The word “class” is generally understanding in programming as an object-oriented prototype. Outside of programming, it could refer to a classroom of children; a classification system; a group of children in the same grade (e.g., junior class); or a social hierarchy (e.g., upper-class, lower-class).

Post is even worse, because it can be a verb or a noun even in a programming context. Blogs usually have posts, but you can also post content (or the HTTP verb, POST). Non-tech-wise, we have places to which you can be sent (“I’m being sent to our post in London”), referring to the mail system, or even structural support for fences.

✅ Specificity aids everyone

class Classroom {}
class BlogPost {}

All of this is important because being clear matters more than being concise or clever. After consistency, the most important aspect of naming is being descriptive. The name should describe what the code is doing (vs. why or how) - what a method is doing, or what purpose a variable serves.

For the most part, Classes should be nouns, because they’re describing their domain of influence. Methods should include verbs, because they’re performing actions. Variables should be nouns, reflective of whatever purpose they’re serving.

If you find yourself struggling with the length of your names of methods, variables or classes, that’s not a bad thing. It’s usually a sign you need to consider refactoring (more on this a bit later).

To the point of clarity, be sure to use properly spelled real words and names.

⛔ Abbreviations and shortcuts

class DateUtil {
  static function dateStrFrmo(date: Date): string { ... }
}

Humans have surprisingly good short- and medium-term recall around words and names. Using real words and names makes the concept easier for us to reason about, and easier to keep track of in our heads.

I took the example above from a GitHub code search. I think the original example may have been written by a native German speaker, because if we assume “Frmo” is supposed to be “From,” it’s using the German sentence structure that puts the verb at the end of the sentence. That makes sense! But if someone isn’t familiar with that sentence construction, the name of the method becomes functionally useless.

The misspelling part is important in two respects: one, it can introduce confusion (is it supposed to be “from” or ”form”?). The other is relying on the computer - searches, within the IDE or if you’re GREPing, are looking for specific terms. If it’s spelled wrong, it’s not going to get caught in the search.

✅ Use properly spelled real words and names

class DateUtil {
  static function getStringFromDate(date: Date): string { ... }
}

Here we’ve modified it so we essentially have an English sentence - get the string from the date. I know what’s being passed in (the date), and I know what’s coming out (the string), and I know overall what’s happening (I’m getting the string from the date).

Beyond naming, there is one other “big” rule that gets us to comprehensible, predictable and maintainable code, an old adage: “Keep it simple, sweetheart.” I’m not speaking to system complexity here - your overall architecture should be as complex as needed to do the job. It’s closer to SOLID’s single-responsibility principle, writ large: Every module, every class, every method, every variable should have one job.

To our earlier example of Users and Authorization, users will take care of the users while authorization handles auth. Neither of them should care about the internal workings of the other; Authorization just needs to know it can call User::load to return the user object.

At the method level, this is how we keep our names to a manageable length. You should be able to describe what the method in a very short sentence. If you need more length (or you leave out things it’s doing), it’s probably a sign that the method is trying to do too much.

Smaller methods enable reusability - if the method is only doing a specific thing, we are more likely to be able to use it somewhere else. If the method is doing multiple things, we’d likely need to add a parameter in the other cases where we want to use it, because we don’t want all of those things to happen all the time.

Keeping each method to a single task means we can decompose complex methods into multiple individual methods. This also makes it easier to read the code.

Literally just reading the names of methods allows us to infer what’s going on, divorced of context. For the example below, we would know from the file name this is Typescript, and I’ll give one hint that it’s frontend.

✅ Keep it simple, even for complex actions

function constructor() {
    this.assignElements();
    this.setInterval();
    this.getNewArt();
    this.listenForInstructions();
}

Initializing this class assigns elements and sets an interval (meaning there are actions that happen on a set schedule); then we get new art, and listen for instructions. Without even knowing the name of the class, we can pretty confidently assume this has to do with art, and that art gets changed frequently (hence the interval). But there also appears to be a manual interruption possible, with listen for instructions.

If we were debugging an issue related to keyboard commands, I would first look to listenForInstructions. If there’s an issue with art not showing up, I would check getNewArt.

Each method is narrowly scoped, even if a lot happens. Keeping things simple aids comprehension and predictability, but it’s also vital for maintainability. It makes it much easier to write tests.

We cannot confidently change code without tests. If we’re making modifications to code without tests, we can read, guess and hope that it won’t create any downstream issues, but unless we know exactly what should happen with a given method in all the ways it can be used, we cannot be certain of the impact of any change. A downstream issue is the definition of a regression; determining the output of a method in changing circumstances is the definition of a test. Thus why we test to avoid regressions.

A good unit test is like a science experiment - a hypothesis is proffered, and borne out or disproven through data, accounting for variables. In programming, variables are literally variables.

If we know exactly what our code will do, we have the flexibility to use it in different circumstances. That may sound tautological, but the confidence we know “exactly” what it will do comes through tests, not an internal sense of “I know what that function does." I would argue most bugs arise through either typos or incorrect assumptions. Most of the typo bugs are caught before release. Most of the insidious bugs that take forever to debug are because someone made an assumption along the way that you have to find and correct.

If all functions perform as we expect, integration issues are drastically reduced. Good unit testing reduces the amount of integration and regression testing you have to do. Maintenance overall becomes easier because we have solid bulwarks of functionality, rather than needing to reason through all possible eventualities fresh every time (or worse, assuming).

I’m not a huge believer in code coverage as a benchmark for quality. I think it can be helpful to have a minimal coverage requirement as you’re starting to remind yourself to write tests, but 100% coverage means absolutely nothing on its own. Quality is much more important than quantity when it comes to testing, especially that you’re testing the right things.

Keeping it simple also relates to abstractions. Code is a series of abstractions (unless you’re writing assembly, in which case, vaya con Dios), but I’m referring specifically to abstractions in the codebase that you write. The cardinal sin of object-oriented programming is a simple rule: “Don’t Repeat Yourself.” It’s not … bad advice, but neither is it a simple panacea we could automate away with, say, a linter or a formatter (or, god forbid, AI).

DRY is overemphasized, possibly because it’s such an easy heuristic to apply. “Hey this looks like other code” is easy to see at a glance, and if you just have an automatic reaction of “I shouldn’t repeat myself ever,” you’ll automatically push that logic up to single method that can be used in multiple places.

But deduplication requires an abstraction. In most cases, you’re not performing exactly the same logic in two places, but two minor variations (or the same logic on two different types of objects). Those variations then require you to include a parameter, to account for a slight branch.

Having that abstracted method hinders comprehensibility. Even if it’s easier/faster to read a one-line reference to the abstracted method, the actual logic being performed now happens out-of-sight.

I am much less concerned with duplication of code than I am making sure we find the right abstraction. Thus, I want to propose a different model for thinking about repetition, two rules  (because again, simpler != terse) to replace the one DRY rule: we’ll call it the Concilio Corollary to the DRY rule, or the damp dyad.

  1. Don’t repeat yourself repeating yourself

  2. The wrong abstraction will cost you more than repetition

DRYRY is a little tongue-in-cheek, but essentially don’t worry about trying to find an abstraction until you’ve implemented similar logic at least three times. Twice is a coincidence, three times is a pattern. Once you’ve seen the code being used in three different places, you now have the context to know whether it’s a) actually doing the same work, and b) how to abstract it to work in different scenarios.

If you find yourself adding a parameter that changes the flow of logic in each scenario, it’s probably more correct to abstract only those small parts that are the same, and implement the particular logic in the context it’s being used. That’s how we find the right abstraction.

All of this is important because existing code has inertia. This is relevant whether you’re a more senior developer or just starting out in your career.

Those with less experience tend to be terrified to change existing code, and understandably so. That code already exists, it’s doing a job. Even if you’re going in to fix a bug, presumably that code was working well enough when it was written that no one noticed it. And no one wants to be the one to create an error, so the existing code is treated as something close to sacrosanct.

For more experienced developers, know that when you’re writing code you’re creating the building blocks of the application. You’re setting the patterns that other developers later will try to implement, because it’s “correct” and becomes that codebase’s style. Heck, that’s literally the predictability maxim - we want to it look similar when it does similar things. But that means if you’re writing the wrong abstraction in one place, its impact may not be limited to that single area.

And when a new case arises, the next developer has to decide (without the context of the person who originally wrote it) whether to modify the existing abstraction, or create a new one. But the old one is “correct” (again, in that exists), so it’s safer to just use that one. Or, worst case, use it as a template to create a new abstraction. In either case, a new paradigm is being created that needs to be tested and raises the overhead on maintenance, because now we have a separate logic branch.

Those are the big topics I wanted to hit. The rest of these recommendations are important, but lack an overall theme. The biggest of these I want to discuss is commenting.

Naming should be used so we know the “what” of code, comments should be used so we know the “why.” I am not referring to automated comments here (e.g., explanations for input parameters in the like in JSDoc), but rather qualitative comments. I would argue that, currently, most existing comments I see would be superfluous if proper naming conventions were used.

What I want to see in a comment is why a particular variable is a particular value, when it’s not clear from the existing context.

⛔ Don't explain the what

const SOCIAL_MEDIA_CHARACTER_COUNT = 116;
// shortens title for social media sharing
export const getSocialShareText = (post: BlogPost) => {
	if (post.title.length =< SOCIAL_MEDIA_CHARACTER_COUNT) {
		return post.title;
	} else {
		return post.title.substr(0,SOCIAL_MEDIA_CHARACTER_COUNT);
	}
}

This a pretty typical example of what I see comments used for. We’ve used naming properly (the method gets the social share text, the constant is the character count we use for social media posts)j, so the comment “shortens title for social media sharing” is superfluous.

This method provides the social media content. The piece of information I don’t have about this code that I would like, both for comprehensibility and maintainability, is why the character count is 116.

The answer is that Twitter used to be the social media service with the shortest content length, 140 characters. Except that since we’re developing an app, we’re always including a URL, for which Twitter automatically generates a shortlink that takes up 23 characters (+ 1 for the space between content and link). 140-23-1 = 116.

That context does not exist within the application, and it’s not under our control. So we should include it in a comment, so that if that number changes (or something else becomes popular but has a shorter length limit, or we stop worrying about Twitter entirely), we know both from reading the code what this does, and it puts a signpost with the word “Twitter” in the comment so it can be found if we just do a search.

✅ Explain the "why"

// Twitter has shortest character limit (140); URL shortener is always 23 + space
const SOCIAL_MEDIA_CHARACTER_COUNT = 116;
export const getSocialShareText = (post: BlogPost) => {
	if (post.title.length =< SOCIAL_MEDIA_CHARACTER_COUNT) {
		return post.title + ' ' + post.url;
	} else {
		return post.title.substr(0,SOCIAL_MEDIA_CHARACTER_COUNT) + ' ' + post.url;
	}
}

The other thing to keep in mind about comments is that they’re a dependency just as much as code. If we do update that character count, we also need to update the comment explaining it, otherwise we’ve actively corrupted the context for the next person who has to make a change.

I used to say “never use ternaries,” but I’ve come around a bit. I now believe ternaries should be used only declaratively, with proper formatting.

✅ Use declarative ternaries, with formatting

const title = (postRequest['title']) 
			? postRequest['title'] 
			: '';

const title = postRequest['title'] || '';

Ternaries are short, concise, and difficult to reason about if they’re too complicated. When I say “declarative” ternaries, I mean “the value of a variable is one of two options, dependent upon a single condition.”

If you need to test multiple conditions, or if you have more than one variable changing as a result of a condition or set of conditions, don’t use ternaries. Use regular if-else statements. It’s easier to read and comprehend, and it’s easier to make changes down the road (more likely if already have multiple conditions or states).

And never nest ternaries.

The last bit is around testing, specifically standard library functions. A standard library function is one that comes packaged in with the programming language you’re using - think Math.round() for Javascript, or the above substring method on strings str.substr(0,3).

As a rule, you should not test the functionality of code you have no control over - if Chrome is shipping a bad Math.round(), there isn’t anything you can do about it (plus, if you go down that rabbit hole long enough you’ll eventually have to test that the heat death of the universe hasn’t yet happened). Standard library functions fit that description.

But sometimes you do you want to test a method that only uses standard library functionality - the reason is not that you’re testing that functionality, but rather that you’re arriving at the desired result.

We’ll use the social media text as the example. I will always assume substring is working properly until I get user reports, and even then the most I would do is forward them along. What I want to test for is the length of the string that is returned - does it meet my requirements (under 116)? I’m not testing the functionality, I’m including a flag to myself and future developers that this is the maximum length and, if someone modifies the functionality of the method, it should be flagged.

describe('getSocialMediaText restricts to Twitter length', ()=> {
	it('when title is less than length', () => {
		expect(getSocialMediaText(MockPostShortTitle).length =< 116)
	}),
	it('when the title is more than length', () => {
		expect(getSocialMediaText(MockPostLongTitle).length =< 116)
	})
});

If we were testing functionality, I would call the same constant in my test, because that’s what’s being used internally. But because I’m testing outcomes, I use an independent value. If someone changes the length without changing the test, they’ll get notified. They can at that point change the value used in the test, too, but the test has served its purpose - it notified someone when they violated my “why.”

TL;DR

  • Focus on specific aspects of code quality

    • Comprehensible, Predictable, Maintainable

  • Name stuff properly

    • Clarity over concision and wit

  • Keep things simple

    • One module, one class, one method, one variable: One job

  • Write tests

    • The only way to confidently modify or reuse code is be assured of what it does

  • Remember the damp dyad

    • Don’t repeat yourself repeating yourself

    • The wrong abstraction costs more than repetition

  • Comments should explain "why"

    • Provide context for the next person (let naming focus on “what”)

It is definitely TL, but if I had to W it, you have to R it. Or just come to one of my talks!