Software Specifications - and why do they matter?

← Blog

Software Specifications - and why do they matter?

2023-05-06

Introduction

While working on mswms at first, naturally I was curious how can the slippy map display tile images with a multitude of open Web Map Services. And then gradually I found out that all of the services adhere to specifications by the OGC. This enables larger collaboration in the software communities, people can develop open software which exposes routes as listed in a spec. And on deployment, there aren’t a lot of custom changes one has to make to the service while exposing it to a random user from the public.

Common specifications we use everyday

Even before we knew it, in our first DBMS exercises we used ANSI standards to write SQL queries (Meanwhile Paritosh if you’re here, go home man!). So whenever a new database system is developed, if the syntax adheres to this spec, anybody who knew say MySQL / could get onboarded easily.

That said, every DBMS has some extra features and how they manage storage, processing, etc. So naturally, there are some fields in the spefication left for the developers to come up with, this is shown with a better example in the next section.

Generally, we find that once a storage system, or service is too popular, its format becomes a specification. Thus, as more and more people start using it, and if it has a specification, it can be treated as a black-box and developers from public can build parsers, clients, IO tools around the spec.

Some other common specs are https://yaml.org/spec/, JSON’s spec, HTTP-2’s spec.

How much should a specification, well.. specify?

Very recently, I’ve been working on top of the STAC Spec, and it’s brilliant how much existing open tools exist on this spec.

stac-fastapi, staccato, a full list can be found here. And believe me, the spec is very new, the earliest GitHub release was in April 2018. All these tools in the ecosystem came up within 3 years.

Now imagine, if there were no specifications, which was the case until 2018. There were individual geospatial data catalogs, like GeoAdmin, GeoServer, etc. And there was very less scope of collaboration.

So all was good until we started working on a few public stac services. The catalog supports pagination, and all the routes are documented in the spec. This is the workflow for when the catalog size is huuge.

Query 10 elements.
The last element in every query has a reference(a link) to query the next 10 elements.
If the catalog runs out of elements, it returns an empty link.

The clients should be aware of this, and this is mentioned in the spec.

But, how the links are constructed is upto someone who deploys this service, and is left out in the spec. They can be any of the following formats.

/collection/<collection-name>/items?apiToken=tfriynding&limit=10 where apiToken is stored in a db which states that serve from the 11th-20th items.
Or one can explicitly specify the indices in URL instead of api tokens. like this, /collection/<collection-name>/items?limit=10&start=11

This causes an issue if one wishes to query 100-110, in the first request.

Here is a discussion remotely related to the decision to leave a strict pagination principle out of the spec.

I assume, that a lot of developers might want to use it for auth tokens, maybe they didn’t want to spam the STAC with a request which queries for 20000 elements at once.

Creating injectable components, like stac-extensions and leaving only the required modules in the core spec also helps to keep things modular.

:)

That’s my time folks, hope we all appreciate specs a bit more after knowing about the efforts people take to write them! I sure do. G’day.