Data regulations for developers

Software requirements are rather straightforward - if we look at the requirements document, we see simple, declarative statements like "Users can log out," or "Users can browse and create topics." And that's when we're lucky enough to get an actual requirements document.

This is not legal advice
None of the following is intended to be legal advice. I am not a lawyer, have not even read all that many John Grisham novels, and am providing this as background for you to use. If you have actual questions, please take them to an actual lawyer. (Or you can try calling John Grisham, but I doubt he'd pick up.)

But there are other requirements in software engineering that aren't as cut-and-dried. Non-functional requirements related to things like maintainability, security, scalability and, most importantly for our purposes, legality.

For the sake of convenience, we're going to use "regulations" and other derivations of the word to mean "all those things that carry the weight of law," be they laws, rules, directives, court orders or what have you.

Hey, why should I care? Isn't this why we have lawyers?

Hopefully your organization has excellent legal representation. Also hopefully, those lawyers are not spending their days watching you code. That's not going to be fun for them or you. You should absolutely use lawyers as a resource when you have questions or aren't sure if something would be covered under a specific law. But you have to know when to ask those questions, and possess enough knowledge when your application could be running afoul of some rule or another.

It's also worthwhile to your career to know these things! Lots of developers don't, and your ability to point them out and know about them will make you seem more knowledgeable (because you are!). It will also make you seem more competent and capable than another developer who does not – again, because you are! This stuff is a skillset just like knowing Django.

While lawyers may be domain experts, they aren't always (especially at smaller organizations) and there are lots of regulations that specifically cover technology/internet-capable software that domain experts likely would not (and should not) be expected to be on top of. Further, if you are armed with foreknowledge, you don't have to wait for for legal review after the work has been completed.

Also, you know, users are people, too. Most regulations wind up being bottom-of-the-barrel expectations that user data is safeguarded and restricting organizations from tricking users into doing things they wouldn't have otherwise. In the same way I would hope my data and self-determination are respected, I also want to do the same for my users.

Regulatory environments

The difference in the regulatory culture between the US and the European Union is vast. I truly cannot stress how different they are, and that's an important thing to know about because it can be easy to become fluent in one and assume the other is largely the same. It's not. Trust me.

United States

The US tends, for the most part, to be a reactionary regulator. Something bad happens, laws or rules (eventually) get written to stop that thing from happening again.

Also, the interpretations of those rules tend to fluctuate more than in the EU, depending on things seemingly as random as which political party is in power (and controlling the executive branch, specifically) or what jurisdiction a lawsuit is filed in. We will not go in-depth into those topics, for they are thorny and leave scars, but it's important to note. The US also tends to give wide latitude to the defense of, "but it's our business model!" The government will not give a full pass on everything, but they tend to phrase things in terms of "making fixes" rather than "don't do that."

Because US regulations tend to be written in response to a specific incident or set of incidents, they tend for the most part to be very narrowly tailored or very broad ("e.g., TikTok is bad, let's give the government the ability to jail you for 20 years for using a VPN!"), leaving little guidance to those of us in the middle. This leaves lots of room for unintended consequences or simply failing to achieve the stated goals. In 2003, Congress passed the CAN-SPAM Act to "protect consumers and businesses from unwanted email." As anyone who ever looks at their spam box can attest, CAN-SPAM's acronym unfortunately seems to have meant "can" as in "grant permission," not "can" as in "get rid of."

European Union

In contrast, the EU tends to issue legislation prescriptively; that is, they identify a general area of concern, and then issue rules about both what you can and cannot do, typically founded in some fundamental right.

This technically is what the US does on a more circumspect level, but the difference is the right is the foundational aspect in the EU, meaning it's much more difficult to slip through a loophole.

From a very general perspective, this leads to EU regulations being more restrictive in what you can and can't do, and the EU is far more willing to punish punitively those companies who run afoul of the law.

Global regulations

There are few regulations that apply globally, and usually they come about backwards - in that a standard is created, and then adopted throughout the world.

Accessibility

In both the US and the EU, the general standard for digital accessibility is WCAG 2.1, level AA. If your website or app does not meet (most) of that standard, and you are sued, you will be found to be out of compliance.

In the US, the reason you need to be compliant comes from a variety of places. The federal government (and state governments) need to be compliant because of the Rehabilitation Act of 1974, section 508. Entities that receive federal money (including SNAP and NSF grants) need to be compliant because of the RA of 1974, section 504. All other publicly accessible organizations (companies, etc.) need to have their websites compliant because of the Americans with Disabilities Act and various updates. And all of the above has only arisen through dozens of court cases as they wound their way through the system, often reversing each other or finding different outcomes with essentially the same facts. And even then, penalties for violating the act are quite rare, with the typical cost being a) the cost of litigation, and b) the cost of remediation and compliance (neither of which are small, but they're also not punitive, either).

In the EU, they issued the Web Accessibility Directive that said access to digital information is a right that all persons, including those with disabilities, should have, so everything has to be accessible.

See the difference?

WCAG provides that content should be

Perceivable - Your content should be able to be consumed in more than one of the senses. The most common example of this is audio descriptions on videos (because those who can't see the video still should be able to glean the relevant information from it).
Operable - Your content should usable in more than one modality. This most often takes the form of keyboard navigability, as those with issues of fine motor control cannot always handle a mouse dextrously.
Understandable - Your content should be comprehensible and predictable. I usually give a design example here, which is that the accessibility standard actually states that your links need to be perceivable, visually, as links. Also, the "visited" state is not just a relic of CSS, it's actually an accessibility issue for people with neurological processing differences who want to be able to tell at a glance what links they've already been to.
Robust - Very broadly, this tenet states you should maximize your compliance with accessibility and other web standards, so that current and future technologies can take full advantage of them without requiring modification to existing content.

Anyway, for accessibility, there's a long list of standards you should be meeting. The (subjectively) more important ones most frequently not followed are:

Provide text alternatives for all non-text content: This means alt text for images, audio descriptions for video and explainer text for data/tables/etc. Please also pay attention to the quality – the purpose of the text is to provide a replacement for when the non-text content can't be viewed, so "picture of a hat" is probably not an actual alternative.
Keyboard control/navigation: Your site should be navigable with a keyboard, and all interactions (think slideshows, videos) should be controllable by a keyboard.
Color contrast: Header text should have a contrast ratio of 3:1 between the foreground and background; smaller text should have a ratio of 4.5:1.
Don't rely on color for differentiation: You cannot rely solely on color to differentiate between objects or types of objects. (Think section colors for a newspaper website: You can't just have all your sports links be red, it has to be indicated some other way.)
Resizability: Text should be able to be resized up to 200% larger without loss of content or functionality
Images of text: Don't use 'em.
Give the user control: You can autoplay videos or audio if you must, but you also have to give the user the ability to stop or pause it.

There are many more, but these are the low-hanging fruit that lots of applications still can't manage to pick off

PCI DSS

The Payment Card Industry Data Security Standard is a set of standards that govern how you should store credit card data, regulated by credit card companies themselves. Though some individual US states require adherence to the standards (and fine violators appropriately), federal and EU law does not require you to follow these standards (at least, not specifically these standards). However, the credit card companies themselves can step in and issue fines or, more critically, cut off access to their payment networks if they find the breaches egregious enough.

In most cases, organizations offload their payment processing to a third party (e.g., Stripe, Paypal), who is responsible for maintaining compliance with the specification. However, you as the merchant or vendor need to make sure you’re storing the data from those transactions in the manner provided by the payment processor; it’s not uncommon to find places that are storing too much data on their own infrastructure that technically falls under the scope of PCI DSS.

Some of the standards are pretty basic - don’t use default vendor passwords on hardware and software, encrypt your data transmissions. Some are more involved, like restricting physical access to cardholder data, or monitoring and logging access to network resources and data.

EU regulations

GDPR

The EU's General Data Privacy Regulation caused a big stir when it was first released, and for good reason. It completely changed the way that companies could process and store user data, and severely restricted what sort of shenanigans companies can get up to.

The GDPR states that individuals have the right to not have their information shared; that individuals should not have to hand over their information in order to access goods or services; and that individuals have further rights to their information even once it's been handed over to another organization.

For those of us on the side of building things, it means a few things are now requirements that used to be more "nice-to-haves."

You must get explicit consent to collect data If you're collecting data on people, you have to explicitly ask for it. You have to specify exactly what information you're collecting, the reason you're collecting it, how long you plan on storing it and what you plan to do with it (this is the reason for the proliferation of all those cookie banners a few years ago). Furthermore, you must give your users the right to say no. You can't just pop up a full-screen non-dismissable modal that doesn't allow them to continue without accepting it.
You can only collect data for legitimate purposes Just because someone's willing to give you data doesn't mean you're allowed to take it. One of my biggest headaches I got around GDPR was when a client wanted to gate some white papers behind an email signup. I patiently explained multiple times that you can't require an email address for a good or service unless the email address was required to provide said good or service. No matter how many times the client insisted that he had seen someone else doing the same thing, I stood firm and refused to build the illegal interaction.
Users have the right to ask for the data you have stored, and to have it deleted Users can ask to see what data you have stored on them, and you're required to provide it (including, again, why you have that data stored). And, unless it's being used for legitimate processing purposes, you have to delete that data if the user requests it (the "right to be forgotten").

And all of this applies to any organization or company that provides a good or service to any person in the EU. Not just paid, either – it explicitly says that you do not have to charge money to be covered under the GDPR. So if your org has an app in the App Store that can be downloaded in Ireland, Italy, France or any other EU country, it and likely a lot more of your company's services will fall under GDPR.

As for enforcement, organizations can be fined up to €20 million, or up to 4% of the annual worldwide turnover of the preceding financial year, whichever is greater. Amazon Europe got docked €746 million for what was alleged "[manipulation of] customers for commercial means by choosing what advertising and information they receive[d]" based on the processing of personal data. Meta was fined a quarter of a billion dollars a few different times.

But it's not just the big companies. A translation firm got hit with fines of €20K for "excessive video surveillance of employees" (a fine that's practically unthinkable in the US absent cameras in a private area such as the bathroom), and a retailer in Belgium had to pay €10K for forcing users to submit an ID card to create a loyalty account (since that information was not necessary to creating a loyalty account).

Digital Markets Act

The next wave of regulation to hit the tech world was the Digital Markets Act. which is aimed specifically at large corporations that serve a “gatekeeping functionality” in digital markets in at least three EU countries. Although it is not broadly applicable, it will change the way that several major platforms will work with their data.

The directive’s goal is to break up the oversized share that some platforms have in digital sectors like search, e-commerce, travel, media streaming, and more. When a platform controls sufficient traffic in a sector, and facilitates sales between businesses and users, it must comply with new regulations about how data is provisioned and protected.

Specifically, those companies must:

Allow third parties to interoperate with their services
Allow businesses to access the data generated on the platform
Provide advertising partners with the tools and data necessary to independently verify claims
Allow business users to promote and conduct business outside of the platform

Additionally, the gatekeepers cannot:

Promote internal services and products over third parties
Prevent consumers from linking up with businesses off their platforms
Prevent users from uninstalling preinstalled software
Track end users for the purpose of targeted advertising without users’ consent

If it seems like these are aimed at the Apple App Store and Google Play Store, well, congrats, you cracked the code. The DMA aims to help businesses have a fairer environment in which to operate (and not be completely beholden to the gatekeepers), and allow for smaller companies to innovate without being hampered or outright squashed by established interests.

US regulations

The US regulatory environment is a patchwork of laws and regulations written in response to various incidents, and with little forethought for the regulatory environment as a whole. It’s what allows you as a developer to say, “Well, that depends …” in response to almost any question, to buy yourself time to research the details.

HIPAA

Likely the most well-known US privacy regulation, HIPAA covers almost none of the things that most people commonly think it does. We'll start with the name: Most think it's HIPPA, for Health Information Privacy Protection Act. It actually stands for Healthcare Insurance Portability and Accountability Act, because most of the law has nothing to do with privacy.

It is very much worth noting that HIPAA only applies to health plans, health care clearinghouses, and those health care providers that transmit health information electronically in connection with certain administrative or financial transactions where health plan claims are submitted electronically. It also applies to contractors and subcontractors of the above.

That means most of the time when people publicly refuse to comment on someone's health status because of HIPAA (like, in a sports context or something), it's nonsense. They're not required to disclose it, but it's almost certainly not HIPAA that's preventing them from doing so.

What is relevant to us as developers is the HIPAA Privacy Rule. The HIPAA privacy rule claims to "give patients more control over their health information, set boundaries on the use of their health records, establish appropriate safeguards for the privacy of their information."

What it does in practice is require that you have to sign a HIPAA disclosure form for absolutely every medical interaction you have (and note, unlike GDPR, that they do not have to let you say "no"). Organizations are required to keep detailed compliance policies around how your information is stored and accessed. While the latter is undoubtedly a good thing, it does not rise to the level of reverence indicated by its stated goals.

What you as a developer need to know about HIPAA is you need to have very specific policies (think SOC II [official link] [more useful link]) around data access, operate using the principle of least privileged access (only allow those who need to see PHI to be able to access it), and specific security policies related to the physical facility where the data is stored.

HIPAA’s bottom line is that you must keep safe Protected Health Information (PHI), which covers both basic forms of personally identifiable information (PII) such as name, email, address, etc., as well as any health conditions those people might have. This seems like a no-brainer, but it can get tricky when you get to things like disease- or medicine-specific marketing (if you’re sending an email to someone’s personal email address on a non-HIPAA-compliant server about a prostate cancer drug, are you disclosing their illness? Ask your lawyer!).

There are also pretty stringent requirements related to breach notifications (largely true of a lot of the compliance audits as well). These are not things you want to sweep under the rug. It’s true that HIPAA does not see many enforcement acts around the privacy aspects as some of the other, jazzier regulations. But health organizations also tend to err on the side of caution and use HIPAA-certified hosting and tech stacks, as any medical provider will be sure to complain about to you if you ask them how they enjoy their Electronic Medical Records system.

Section 230 of the Communications Decency Act

Also known as the legal underpinnings of the modern internet, Section 230 provides that "No provider or user of an interactive computer service shall be treated as the publisher or speaker of any information provided by another information content provider."

In practice, this means that platforms that publish user-generated content (UGC) will not be treated as the "publisher," in the legal sense, of that content for the purposes of liability for libel, etc. This does not mean they are immune from copyright or other criminal liabilities but does provide a large measure of leeway in offering UGC to the masses.

It's also important to note the title of the section, "Protection for private blocking and screening of offensive material." That's because Section 230 explicitly allows for moderation of private services without exposing the provider to any liability for failing to do so in some instances. Consider a social media site that bans Nazi content; if that site lets a few bad posts go through, it does not mean they are on the hook for those posts, at least legally speaking. Probably a good idea to fix the errors lest they be found guilty in the court of public opinion, though.

GLBA

The Graham-Leach-Biley Act is a sort of privacy protection policy for financial institutions. It doesn’t lay out anything particular novel or onerous - financial institutions need to provide a written privacy policy (what data is collected, how it’s used, how to opt-out), and provides some guidelines companies need to meet about safeguarding sensitive customer information. The most interesting, to me, requirement is Pretext Protection, which actually enshrines in law that companies need to have policies in place for how to prevent and mitigate social engineering attacks, both of the phishing variety as well as good old-fashioned impersonation.

COPPA

The Children's Online Privacy Protection Rule (COPPA, and yes, it’s infuriating that the acronym doesn’t match the name) is one of the few regulations with teeth, largely because it is hyperfocused on children, an area of lawmaking where overreaction is somewhat common.

COPPA provides for a number of (now) common-sense rules governing digital interactions that companies can have with children under 13 years old. Information can only be collected with:

Explicit parental consent.
Separate privacy policies must be drafted and posted for data about those under 13.
A reasonable means for parents to review their children's data.
Establish and maintain procedures for protecting that data, including around sharing that data.
Limits on retention of that data.
Prohibiting companies from asking for more data than is necessary to provide the service in question.

Sound weirdly familiar, like GDPR? Sure does. Wondering why only children in the US are afforded such protections? Us too!

FERPA

The Family Educational Rights Protection Act is sort of like HIPAA, but for education. Basically, it states that the parents of a child have a right to the information collected about their child by the school, and to have a say in the release of said information (within reason; they can't squash a subpoena or anything). When the child reaches 18, those rights transfer to the student. Most of FERPA comes down to the same policy generation around retention and access discussed in the section on HIPAA, though the disclosure bit is far more protective (again, because it's dealing with children).

FTC Act

The Federal Trade Commission Act of 1914 is actually the law that created the Federal Trade Commission, and the source of its power. You can think of the FTC as a quasi-consumer protection agency, because it can (and, depending on the political party in the presidency, will) go after companies for what aren't even really violations of law so much as they are deemed "unfair." The FTC Act empowers the commission to prevent unfair competition, as well as protect consumers from unfair/deceptive ads (though in practice, this has been watered down considerably by the courts).

Nevertheless, of late the FTC has been on a roll, specifically targeting digital practices. An excellent recent example was the settlement by Epic Games, makers of Fortnite. The FTC sued over a number of allegations, including violations of COPPA, but it also explicitly called out the company for using dark patterns to trick players into making purchases. The company’s practice of saving any credit cards used (and then making that card available to the kids playing), confusing purchasing prompts and misleading offers were specifically mentioned in the complaint.

CAN-SPAM

Quite possibly the most useless technology law on the books, CAN-SPAM (Controlling the Assault of Non-Solicited Pornography And Marketing Act) clearly put more time into the acronym than the legislation. The important takeaways are that emails need:

Accurate subjects
To disclose themselves as an ad
Unsubscribe links
A physical address for the company

And as your spam box will tell you, it solved the problem forever. This does not, however, mean you can ignore its strictures! As a consultant at a company that presumably wishes to stay on the right side of the law, you should still follow its instructions.

CCPA and Its Ilk

The California Consumer Privacy Act covers, as its name suggests, California residents in their dealings with technology companies. Loosely based on the GDPR, CCPA requires that businesses disclose what information they have about you and what they do with it. It covers items such as name, social security number, email address, records of products purchased, internet browsing history, geolocation data, fingerprints, and inferences from other personal information that could create a profile about your preferences and characteristics.

It is not as wide-reaching or thorough as GDPR, but it’s better than the (nonexistent) national privacy law.

The CCPA applies to companies with gross revenues totaling more than $25 million, businesses with information about more than 50K California residents, or businesses who derive at least 50% of their annual revenue from selling California residents’ data. There are similar measures that have already been made law in Connecticut, Virginia, Colorado, and Utah, as well as other states also considering relevant bills.

Other state regulations

The joy of the United States’ federalist system is that state laws can be different (and sometimes more stringent!) than federal law, as we see with CCPA. It would behoove you to do a little digging into the state regulations when you’re working with specific areas — e.g., background checks, where the laws differ from state to state, as even though you’re not based there, you may be subject to its jurisdiction.

There are two different approaches companies can take to dealing with state regulations: Either treat everyone under the strictest regulatory approach (e.g., treat every user like they’re from California) or make specific carve-outs based on the state of residence claimed by the user.

It is not uncommon, for example, to have three or four different disclosures or agreements for background checks ready to show a user based on what state they reside in. The specific approach you choose will vary greatly depending on the type of business, the information being collected, and the relevant state laws.

A single-image version with the regulations we spoke about grouped under their headers (e.g., Education has FERPA)

How to implement

Data compliance is critical, and the punitive aspects of GDPR’s enforcement means your team must have a solid strategy for compliance.

The most important aspect of dealing with any regulatory issue is first knowing what’s required for your business. Yes, you’re collecting emails, but to what end? If that data is necessary for your business to function, then you have your base-level requirements.

Matching those up against the relevant regulations will provide you with a starting point from which you can begin to develop the processes, procedures and applications that will allow your business to thrive. Don’t rely on “that’s how we’ve always done it” or “we’ve seen other people do x” as a business strategy.

The regulatory environment is constantly shifting, and it’s important to both keep abreast of changes as well as always knowing what data and services are integral to your business’s success. Keeping up with the prevalent standards will aid you not only in not getting sued, but also ensuring your companies that you’re a trustworthy and reliable partner.

How to keep up

It all seems a little daunting, no?

But you eat the proverbial regulatory elephant the same way you do any other large food item: one bite at a time. In the same way you didn’t become an overnight expert in securing your web applications against cross-site scripting attacks or properly manage your memory overhead, becoming a developer who’s well-versed in regulatory environments is a gradual process.

Now that you know about some of the rules that may apply to you, you know what to keep an eye out for. You know potential areas to research when new projects are pitched or started, and you know where to ask questions. You know to both talk to and listen to your company’s legal team when they start droning on about legalistic terms