The Problem With APIs

Creating great APIs is hard. Getting the right level of abstraction and building something that is pleasant for clients to integrate with is a huge challenge. This two part post delves into this topic in more detail.

But first an anecdote… (courtesy of Russ Miles)
Provider: “Ah, you’re missing the special field, BABEL-256. Without that field this behaves subtly differently…”

Consumer: “Wait, what??”

Provider: “Yeah, you can think of this service as a single front door where, past the door, there are a lot of different rooms, hallways, and functions available. ‘Service DoSummat’ is just the front door. The concoction of fields you pass are where all the magic happens. It’s Postel’s Law, we accept almost anything and that’s a good thing…”

Consumer: “So is there an API description? Something I can generate and update my code to use?”

Provider: “There is, but it’s not very useful. You see the API itself doesn’t describe what it does because, well, it could do anything. Based on the cocktail of fields you pass.”

Consumer: “What if that necessary cocktail changes? How would I know?”

Provider: “Well, I guess it would stop working… maybe.”

Consumer: “So the API doesn’t describe what it does, could do anything, and I can’t tell if/when anything changes other than to suck it and see?”

Provider: “Yeah, but it is following Postel’s Law (Editor: sort of… badly), and it was much easier for us to implement it this way as internally it’s how things are handled anyway. So, you see, you’re just being kind by interacting with us in exactly how we expect.”

Consumer: “Right …” (checks watch, and ponders just how dusty their CV is…)

Introduction

In this post we first look at the key things that make APIs complex and difficult to use and the issues that this can cause. We then explore some of the reasons why APIs are built in this way.

The second post moves on to look at what I believe to be the current best practices for designing high quality, easy to integrate APIs. (If you just want the positive, best practice advice then feel free to skip this post of gory details and jump straight to part 2)

The contents of this post focus particularly on APIs for invoking web services over HTTP. However, most of the content covered also applies to non web service APIs, such as via queues, events, or even software language APIs for modules or libraries.

Background

There are plenty of different approaches and standards for building web services APIs, including:

REST
HATEOAS
GraphQL
Web RPC
Hand-crafted JSON or XML over HTTP(S)
Any many more…

Unfortunately, even with all these available standards it is still far too often that we encounter APIs that are overly complex and difficult for clients to integrate with. In almost all cases the cause of this is less about the specific mechanics of the protocols or message formats, and much more about the scope of these APIs.

The worst examples tend to be APIs that are overly generic; those where a single API call is providing multiple alternative functions; those who aim to be fully future proof; those that mix multiple concerns; and those that fail to create meaningful domain abstractions.

Let’s start out by looking at how these scope related concerns tend to manifest themselves…

Issues with Problematic APIs

Some of the general properties that these complex and problematic APIs have are:

Multiple functions in a single API endpoint:
- functions that have some similarity (usually in the format of their request and response data) grouped into a single API.
- selection of function to execute via enumerated type field value or subset of request fields that are supplied.
- response content varies based on the function that was performed or the type of data requested.
Complex multi-operation workflows wrapped behind a single API function:
- encapsulation of large amounts of processing into a single API call.
- complex error codes and status structures in addition to the standard HTTP response codes.
- multiple, complex error handling and recovery paths.
Large numbers of unused fields and field values:
- fields present for all the different functions, with many only used for each small subset of the possible combinations.
- lots of nullable and optional fields or fields requiring hardcoded default values.
- future-proof data structures having extra fields that currently serve no purpose.
Complex and deeply-nested request and response structures:
- requests and responses that contain entire data structures, even though only a sub-set is relevant for each function.
- multi-version and flag controlled data structures whenever new functionality is added to the domain.
- cumbersome state-management concerns mixed with business data.
- exposure of underlying data models from dependent services that the API implementation utilises.
Entanglement of multiple concerns across functions:
- different elements of the API changing at different speeds.
- complex security requirements embedded within the API rather than being treated as a separate layer.
- reliability and scalability considerations seeping into how the API is structured and must be called.

Unfortunately, these properties tend to lead to a many issues that make these APIs difficult to understand, complex to integrate with, and very fragile in operation. Let’s look at some of the most significant…

API understandability

When an API supports multiple different functions that are controlled by combinations of type fields and optional data, it becomes very difficult to understand and reason about what the API will actually do. To understand even the basics about the API’s behaviour it becomes necessary to have a detailed understanding of the data formats, field values, and implementation detail. This increases both the learning curve and effort required to build and maintain mental models of the API.

Additionally, poor and inconsistent naming conventions makes the situation even worse. There’s no way that an endpoint named /srvc/tkteng, that takes parameters with names like ref or d-1, is going to be easily understood by a client integrating with this API.

In an attempt to make the API more understandable it may be shipped with documentation. However this often tends to be verbose and confusing, as it tries to describe all the different combinations of functions and fields that the API supports. Usually it fails to do this successfully, and in many cases important information is either missing or incorrect.

Even worse is when the API is provided with no documentation other than a few example payloads. Trying to figure out what combinations of data are required, what all the fields may mean and which ones are optional in each scenario is a horrid client experience!

Lack of abstraction causing exposure of underlying dependencies

This issue primarily occurs when an API is created for a service that acts as a facade around one or more dependent services, combining them together in some form of aggregate or workflow. While there is nothing wrong with this approach (unless it’s building a large, generic, multi-function service), what often tends to happen is that the service and API fail to create their own bounded context and domain abstraction.

The effect of this is that details of the data models and implementations of those dependent services leak through the service and into its API. The result is that clients of the API are suddenly not coupled just to the data structures and API of this service but also become coupled to the underlying services that it depends on.

The worst examples of this are APIs that just wrap other systems to provide modern calling semantics, such as json over http or reactive systems. Where these just translate data formats from the underlying system to the new format, the clients of the API are directly coupled to the underlying system’s domain model and functionality.

If the models or behaviour of the dependent services change in any way then this ripples through to all client integrations, often causing unpredictable results and client breakages.

Over abstraction and generalisation

An alternative problem can occur if the abstraction process goes too far in the other direction. In this case, rather than not creating an abstraction of the data models and underlying dependencies, the API designer creates a highly abstract and overly general model for the API.

When this happens the result tends to be something that loses all relation to the bounded context and the domain that it represents. This massively complicates understanding of the API. These types of endpoint usually become very finicky to work with, as it becomes difficult to work out which generic field and value combinations are required to get things done.

Long term maintenance usually becomes very difficult a well. Small changes to one API endpoint tend to ripple through the overly generic underpinnings and either break things unexpectedly or impact other unrelated endpoints that are built on the same highly abstract foundations.

Difficulty in handling change

Large, complex services tend to have associated large and complex data models. If the API for the service is also very generic then typically it tends to pass around complete (or significant portions of the) data model in its request and response data. When functionality is added or changed then this almost invariably results in associated model changes and knock on impacts to client integrations.

In a very naive world, the service just updates its data model and API, all the clients break and have to be updated and re-released. While this may be possible in a system made up of just a couple of services all maintained by the same team, this just isn’t practical in larger systems, those having external client integrations or teams working on different schedules.

Some common approaches that are used to resolve this include:

Introduction of various API and data model versioning strategies - these introduce complex implementation and testing challenges within the service to support multiple versions and ensure backwards compatibility.
Making everything in the data model optional or nullable and using flags and type fields to control which elements or structures are present and populated. However this:
- significantly complicates the service implementation to deal with knowledge of all the optional combinations.
- makes testing all the supported combinations a large challenge.
- is likely to be a significant source of defects
- makes client integrations more complex (see next sub-section)
Providing client libraries that are released at the same time as the API change - forcing clients to use specific languages and dependencies that they may not want; and often limiting their options around code quality, error handling, recovery and observability.

The other significant challenge to handling change comes about when the API encapsulates a complex workflow. If a client wants to use the service in a slightly different way, the only way this can be achieved is by changes to the service implementation to add new workflow variants. This typically requires associated API changes as well, such as additional type field values, new flags or additional API versions. Changes to an existing working implementations invariably introduce new defects and instability that may impact other clients.

Complex client code requirements

As already mentioned, things like inappropriate abstraction of domain models and complex mechanisms for handling change result in making the client’s integration task much more difficult. Ideally we want to make client implementations as easy to complete as possible, but the issues already described tend to lead to the following client integration challenges:

Requires complex client code to integrate with the API, usually involving lots of conditional statements based on specific combinations of functions and field values.
Lots of optional fields with default values in the API requests and responses requires client code to set up and test all of the defaulted fields even when they aren’t needed for the specific function being called.
Easy to make mistakes when calling the API by invoking the wrong function due to an incorrect field value or wrong subset of fields. Difficult to spot where things have gone wrong.
Complex sets of error codes and failure points. May need quite complex client logic to handle these and figure out how to recover from the failure.
Unable to make full use of error handling built into serialisation/deserialisation libraries when some fields or elements are nullable or optional in some functions and mandatory in others, or have different validations depending on the function being used (also applies to service implementation itself).
Serialisation/deserialisation of a large data model can be very inefficient, especially for high volume functions that only need to work on a tiny subset of that model.
Required small changes to workflows often require additional development by the provider of the API, which in turn slows down project delivery.

None of these lead to the pleasant client integration experience that we are after.

Why Build These Sorts of APIs?

So, given all of the problems that can occur by building these complex, generic, poorly abstracted APIs, what are the reasons that they get created? Let’s explore what, in my experience, are the three most significant…

Attempting to simplify implementations through reuse

A common flaw in software engineering is the tendency to try to create reuse. Developers tend towards looking for commonality between things, such as similar behaviour or data structures, even though the similarities are often fairly superficial. Sometimes this comes from a desire to minimise the amount of code needing to be written; other times from a misunderstanding of DRY principles; or, just because it’s fun and challenging to build some really clever generic, reusable construct.

Whatever the reason, the result is usually a set common base classes, components or generic utility functions. The first implementation of functionality on top of these tends to go fairly smoothly. However, when the next function is implemented it soon becomes apparent that the reusable components aren’t quite a reusable as was expected and they start needing sub-classes, additional data structures, flag fields and the like to vary their behaviour.

After a while it’s realised that the base code just isn’t reusable in the way that was intended and it’s becoming risky and expensive to keep changing it. There’s no budget to rip it out and start again. What therefore tends to happen is that the developers move up a level of abstraction and start adding new functionality to the existing higher-level features that have some similarity to the new feature, as that’s a much easier task. The final outcome is that eventually we end up with a very small number of top level API endpoints that cover multiple different features.

A good team might stop and refactor, while a weaker team or one constrained by release schedules or budget will probably just go ahead and release the complex, multi-function, very generic API. What may have started with a very clean API design can quickly degrade into a complex, generic mess just by trying to aim for reuse where none was really justified.

However, this reuse mindset can also manifest itself in the actual API design itself. It starts with the assumption that the API clients will also want to build reusable code. This in turn pushes API design teams to try to group abstractions together, create highly generalised data structures and control behaviour with flag fields and various request parameters. Soon this yields the unwieldy and complex client integration that we talked about in the last section.

Over my career I’ve implemented integrations to hundreds of APIs, and not once can I say that any benefits of client reuse have outweighed the implementation and maintenance challenges of integrating with a generic or highly complex API designed for that purpose.

Using tools that make building quality APIs difficult

Another common cause is using development tools or frameworks that make building APIs either very challenging or particularly time consuming.

When I first stared building web services, back in the late 1990’s and early 2000’s, the libraries and frameworks that we had available were very limited. Spinning up a new endpoint typically involved steps like:

Create a set of verbose Java Beans for the request and response entities.
Write a manual parser/deserialiser to unpack the request into a Java Bean.
Write some template to generate the response document, substituting in data from a Java Bean.
Write a controller class to call the parser, invoke the business logic and pass the result to the template
Wire everything up across multiple XML configuration files
Build and deploy to a heavyweight J2EE application server

Given this was all so verbose and time consuming, it became more commonplace to minimise the number of endpoints and make each one more feature rich. Writing a single parser and template for larger documents seemed much easier than writing multiple separate ones, especially when there was a lot of commonality across document contents for similar functions.

From this sort of thinking emerged standards such as SOAP, with its rich XML request and response documents. Also, enterprise tools appeared: their purpose to simplify and speed the building of endpoints by allowing developers to define controller workflows in XML rather than code, or drag-and-drop via GUI builders.

Fortunately nowadays we have access to much better tooling and libraries that make it significantly quicker to create and deploy new API endpoints. However, many of the older enterprise tools still exists and are in wide use. Additionally, many of the poor API design practices they engendered are still widely used even with modern tools and libraries: it’s still possible to build complex, poorly abstracted APIs no matter how good the available tools and libraries are!

Not sufficiently considering domain abstractions

As already discussed, a significant cause of many of the issues described earlier in this post is the failure to create domain abstractions as part of the API design and service implementation (or creating ones that are far too generic).

One reason for this might be the difficulty involved in creating good domain models and abstractions. I’m a particular fan of Domain Driven Design, as covered in the seminal book of the same name, written by Eric Evans in 2003. However, my experience has been that the concepts from this book have not permeated the profession to the level required for domain modelling to become a ubiquitous skill among software engineers. This in turn leads to domain modelling and abstractions being either skipped completely, or applied in a way that provides few of the benefits of these techniques.

Secondly, most services tend to be built on data and APIs from one of more underlying systems. There’s a general assumption that these dependent systems have good quality APIs and their own well documented domain models. However, this isn’t always the case. Rather than spend the time to understand the domains of these dependent systems and map them into a new domain abstraction, it is often much easier to just expose data structures of those dependent systems through the API. This is especially easy to do for services that act more as gateways or facades that don’t need to understand of manipulate these data structures. Often this is the primary cause of tightly coupling clients not just to a service’s API but also to the APIs of all its dependencies.

The final observation is that building and maintaining good quality domain abstractions can be time consuming and resource intensive. When a team is under-resourced or subject to unrealistic delivery timescales it is often easier to skip this and push the domain understanding and modelling tasks onto all the clients of the API. This makes the client integration task much more complex and challenging, rather than providing a pleasant integration experience. We’ve already explored in much detail all of the issues this rather short-sighted compromise can create.

Summary

In this post we’ve looked as the properties of APIs that generate complexity, overly generic abstractions and difficulty in integration. We’ve explored the main issues that these can cause. Finally, we’ve considered the reasons why these types of API might be created. In Part 2 we will look at what I consider to be the current API best practices for overcoming these problems.

But first an anecdote… (courtesy of Russ Miles)