Specification first driven API development: consumer protection use-case

Lehar Oha
7 min readJan 8, 2022
https://op.europa.eu/documents/2448002/2579167/consumer+rights/

Introduction

One day I was approached by a close person … she had a poor consumer experience … e-commerce company did not deliver goods according to the “contract” … 😞

Based on Estonian Consumer Protection and Technical Regulatory Authority consumer can issue a claim (to merchant or service provider) if there is a defect in the purchased goods or if customer is not satisfied with the purchased service. If parties cannot reach an agreement, then consumer can escalate the issue and bring it to the Consumer Disputes Committee to settle national disputes (final step before going to the court).

By knowing about my data science interest she (stakeholder going forward) was asking if we could develop a software system which could help out with KYM (Know Your Merchant) due diligence and better manage risks for further potential issues, additionally if still bumping into such issues, then estimate claim success in Consumer Disputes Committee (meaning having decision in favor of a consumer).

Solution requirements

Since stakeholder has adequate IT knowledge, then we were discussing that solution should be exposed as service.

In more specific stakeholder had following general requirements:

  • It should be a exposed via web/HTTPS API interface, hosted in trustworthy cloud computing platform (like MS Azure)
  • API should be private with proper authentication and authorization
  • API will be consumed by other application (stakeholder will build her own dashboard etc. based on it)
  • Payloads should be JSON only
  • Request response time < 1 sec

Then problem domain specific requirements for Know Your Merchant(KYM):

  • For company “X” has there been a claim reviewed by Consumer Disputes Committee (CDC) and how many times in favor of the consumer :
    - must support search by company name (including fuzzy matching, meaning “AS Lehar” and “Lehar AS” should be treated the same)
    - must support optionally search by corporate number (EE issued)
  • Is company “X” in a blacklist (lands there if has not filled CDC decision after a 30 days)
  • Should be build on top of data in black list and CDC claim register of decisions

And finally problem domain specific requirements for claim success estimation:

  • Based on CDC PDF files circumstances section texts (written in 3rd person sense) and final decision (“in favor of consumer” or “detriment of the consumer”) build a binary text classifier
  • Optionally allow classifier to take in merchant viewpoint text (input texts pooled together then)
  • Use classifier to estimate potential claim success (from consumer perspective) in committee meeting, return success probability

Specification first driven development 📃

In specification driven development we define APIs first in standard(ized) specification language, generally as JSON or YAML, it’s done before starting coding the implementation in some programming language (later is called code first approach vs API-spec-documentation first). In API first approach we are taking the problem domain (business logic) and crating a description of it and this can be then in future implemented as a software solution.

In our case we’ll use OpenAPI (historically knows also as Swagger) standard as a API specification language.

By starting with specification first we can name several benefits:

  • for involved parties enables to think things more through and get to know problem domain better, also less room for assumptions not holding in reality
  • all involved parties in sync and aligned (potential communication booster), all follow the same common contract and will be part of it’s iterative evolution
  • communication and iteration will probably lead to more easy to use and robust API
  • it can save resources, since it can be “costly” to build via code first way
  • with helper tools it’s possible to generate application code from specification (may help with coding errors also) and consumers can start testing the API mock out (less lead time)
  • stable API interface should gives us option to write entire backend by using some other technology (for stakeholder it does not matter how it’s done, they just know what they need)
  • since documentation is often last thing developers do, then such documentation driven development helps out with that
  • this could even lead to some cost (cloud related) savings and potentially be more compliant with potential GDPR issues

The above does not mean that we should be religious about specification first development, in practice it can go both or even hybrid ways.

Initial OpenAPI specification

Now will focus on describing what routes, parameters, payloads and which responses our API will produce.

When writing OpenAPI specification in VSCode (IDE we use) it’s good to install OpenAPI (Swagger) Editior extension, this provides SwaggerUI preview, linting, static security analysis and much more.

VSCode extension to work with OpenAPI spec.

To start off in VSCode press “Ctrl+Shift+P” and select “Create new OpenAPI v3.0 file (YAML)”, this creates initial template for us (note that it’s also option to write OpenAPI spec. in JSON, but for us YAML seems more human).

After that we see the following template:

OpenAPI spec. template in VSCode

We can see that specification is dividend into sections, each composed of specific object(s), to name a few:

  • info: this is about the API metadata, like API description, support contacts etc.
  • servers: this specifies URLs where API will be available.
  • paths: this is the meat of the API, it defines the client facing interface, like what endpoints are provided, what requests it can take in and by what schema to respond back.
  • components: this block holds various schemas for the specification, those can be referenced inside spec. file, like a reusable assets.

Now we look how we encoded each block for our stakeholder (based on above solution requirements, in project repository we have a file called “api_spec.yaml”, link to full file in Github).

Info and servers block

In above pic. pay attention also to the left hand panel, here IDE extension shows spec. sections and can help out in navigation, also note that API “description” field has Markdown support (but don’t overutilize it).

Paths block (truncated)

As seen above we describe each endpoint (including methods like GET, POST etc.) payload and parameters, but also response and it’s status codes, rather self explanatory …

Finally “components”:

Components block

From above YAML specification it’s easy to generate interactive UI documentation (in IDE “OpenAPI: show preview using X”), either in Swagger UI or with Redoc, below we show output for later:

ReDoc interactive documentation

After working for a while in describing the API, it might come clear that it’s not too hard to write API specs, but it’s just rather verbose and can be problematic if stakeholder requirements are at first iterations a bit vague, but from other hand it’s good to start simple and do incremental cooperative improvements.

Stakeholder review meeting

Since we have a quite close cooperation with the stakeholder I did not had to chase her for long, found her in the next room playing with kids and started presenting the initial work furiously, but soon it turned out that priorities had changed, snow shoveling took the precedence for now … after that got a confirmation that solution proposal looks good, but some extras would be handy (optional things for future), mentioning some:

  • Include Estonian Tax and Customs Board gathered data about the merchant (to KYM section, like tax debts, turnover etc.)
  • Include claim resolution from consumer perspective (like compensation, price reduction, replacement of goods etc.)
  • Serve merchant related claims textual context or topic

Service implementation

Now it’s time to code the implementation, but since we have already OpenAPI app specification written, then it’s possible to generate code from that. We’ll use Python as our implementation language (but Rust was also considered), since it’s an API first app, then obviously FastAPI framework is suitable choice here.

To generate app we’ll use fastapi-code-generator, it uses under the hood datamodel-code-generator to generate Pydantic models (best tool to build data schemas, data parsing and validation these days in Python land).

After running:

fastapi-codegen --input api_spec.yaml --output app# ordatamodel-codegen --input api_spec.yaml --output models.py# if You want just Pydantic data models

we’ll see “app” folder created under our project structure with “models.py” (Pydantic data models for requests and responses) and “main.py”, this is kinda FastAPI app starter kit for us, of course we must do actual implementation of solution logic:

Data models
App file (truncated)

Note that for stakeholder we could also provide quickly a service mock server so she could test it out with client side tools and start building integrations, additionally stakeholder could use openapi-python-client to create a client library based on a common OpenAPI specification.

We will not go into specific implementation details here, but after coding all up, we can use Schemathesis library to run property based API testing (based on a wonderful Hypothesis library) on endpoints:

Schemathesis API testing

Also we do not cover cloud deployment details here, but we used infrastructure as code approach with Pulumi (Python SDK).

Conclusion

This concludes our journey for now, thanks to the combination of software engineering and data science our stakeholder can now conduct her e-commerce activities with more peace of mind. 👌

--

--

Lehar Oha

Data Scientist at Analytics & AI @ Swedbank Group