Micro Services are trendy… and not without reasons, but it is highly theoretical. You have probably noticed it as well when you started to design those Micro Services (MS): How do I share code? How do I share data? How do I deal with foreign keys? Should this DB table be part of another MS? How do I deal with cross-MS transactions? etc. In practice, MS have many challenges and this article is trying to hint how to deal with it. There is no perfect solution. It is all about weighing each solution’s trade-offs.
Bounded context
The concept of bounded context is crucial to understand those challenges. MS are a “share-nothing” architecture. It is a Domain Partitioned Driven design. What it means is that MS are divided by business domain. For example, a MS manages a cart (i.e. how products and quantity by customers). A classic architecture usually follows a Partitioned Technical Design (i.e. divided by technical layers: user interface, business logic, database, etc.).
The main idea behind Micro Services is to isolate a piece of code having one single purpose done very well. The bounded context is belong to the MS’s domain. In our cart example, it includes the quantity of products by customer. It excludes product and customer information.
Challenge #1: Dealing with code reuse
What do we do with shared code? Simple example, each MS needs to authenticate who’s using the service. How do we share the code? Programmers live by the “reuse is abuse” principle, but how does it work with the MS’ “share nothing”? Maybe Micro Services imply “Please Repeat Yourself!” (PRY). Here are a few ideas:
- Why not copying the code if it does not change and if there is no bug. The problem is that fixing bug is difficult! and difficult to expand (add additional classes)
- Shared library (jar, dll, etc.): can fix and add more classes, but managing versions is hard. Many MS depends on it, which means every MS would need to upgrade the library. And what about heterogenous code base (MS can have be coded in different languages)? In that case, shared library cannot be shared.
- Shared service: positive, high code volatility (make changes without impact), heterogenous polyglot code (MS’s language does not matter). However, versioning is difficult (runtime changes: down times can break all the other MS). Performances can be slow due to network latency. Availability may not be guarantee, etc.
- Service consolidation. One way to avoid shared code is not to have shared code! Basically, it is merging MS that need the same code, but there is larger testing scope, greater risk to deploy, less agility…
Rule of thumb: Make shared code unit as small as possible -> trade off maintenance (e.g. how many MS need to be updated) for governance (e.g. clarity in what shared code each MS depends on). Smaller units mean more dependencies (e.g. MS1, MS2 have the same shared code A. MS1, MS3 have the same shared code B, etc. vs MS1, MS2, MS3 have one shared code ABC), but updating shared code impacts less MS (e.g. Only MS1 and MS2 need to be updated if shared code A is updated vs MS1, 2, and 3 need to be updated if ABC is updated).
Also, who owns the shared code? Ideally it is owned by the primary service domain.
Challenge #2: Service Granularity
Drivers for MS:
- Code volatility (frequency of change)
- Fault tolerance: if a service crashes, what happens? Does it make sense to split to keep a service up?
- Scalability and throughput: easier to increase resources without changing the code.
- Security
Factors for granularity:
- Service functionality: service scope and function: single-purpose function.
- Database transaction: how to sync data between different MS? ex: bank account. The transaction must commit or rollback, not be unstable.
- Data dependencies: how to break the data? Multiple MS may need data from different tables. Cannot do SQL joins! MS need to talk to get data: performance goes down, latency increases, etc.
- Workflow and choreography: everybody is calling everybody. What happens if one MS goes down? How does it impact the other MS? Does it end up taking a whole bunch of MS down / not working / timing out? Performance is going down because of latency. MS Security also increase latency. Also there may be data inconsistency. Updating multiple MS implies too much communication. Too fine grain MS. Minimize amount of inner service communication (Microservices Death-star AntiPattern).
Rule of thumb: get granularity right by iterating: from more to less.
Challenge #3: Sharing Distributed Data
Once we design the MS, how do we share the data? ex: wishlist service and product service: wishlist needs to get information from the product services.
- Call the service: increase latency (100-300ms with security).
- Replicate the data in the wishlist: possible data inconsistency. Worse!
- Caching: replicating data behind the scene in the wishlist. It is more fault tolerant if the product MS goes down, and faster. But volume is an issue, and update rate might be too high leading to data consistency. It may work if the size and the number of updates are low.
Another solution: Creating Data Domains! It is possible to bend to MS rules by sharing a schema when a few MS (2-6 MS) are closely related. It is a pragmatic approach to the MS theory. Otherwise, we would need to break most foreign keys, views and stored procedures. With creating data domains, we reduce the number of breaking. We are mapping information, not table.
Challenge #4: Communication Challenges
MS communicate through API gateway, load balancer or reverse proxy. Stamp coupling (i.e. Passing data structure with more data than needed) is the problem. Ex: retrieving 45 attributes for one needed. It can quickly increase the use of bandwidth (e.g. 1Gb vs 400 kb)
- Field selector: cheap but not effective. API including a parameter specifying what part to return. Not solving stamp coupling since it requires changing the MS contract.
- GraphQL Server is a solution instead of API gateway. It implements consumer driven contract. It decouples contract from what is needed: for example get the client’s address instead of the whole profile. It reduces traffic between the GraphQL server and the client MS, but still takes much bandwidth from the MS responding the request to the GraphQL server (filtering is done in the GraphQL server).
- Direct messaging between the MS solves both stamp coupling and bandwidth. It creates an Internal API without passing through the API gateway. Not going through the gateway. However, what about security, auditing, etc. performed the gateway?
- Value-driven contracts: every single contract is simply a map (pair of key/value). The contract’s purpose is to inform the client code what the services offer. In value-driven contracts, the contract is a map, but it implies no structure and no field definition. So the idea is for the client MS to have tests that verify if the producer MS still provides what is needed. If so, the MS can be deployed. “Test” is also “pact.” It solves both issues (bandwidth and stamp coupling)
- Replicated or distributed caching: issue with update frequency and volume.
Challenge #5: Orchestration vs Choreography
Orchestration requires a conductor. Choreography does not. They communicate with each other instead.
Orchestration pattern
- Gateway is the conductor(s). Multicasting logic is in the conductor. Gateway (ESB) manages client requests, security, id generation, metrics, auditing, service discovery, etc. Deploying is difficult. Update the MS -> update conductor = update gateway. Breaking bounded context.
- The conductor is a MS orchestrator outside of the gateway. It owns the contract. Hot deploy is easy. Abstract the MS for the other applications/ESB. It is not a single point of failure when combined with clusters. An orchestrator is usually stateless.
Aggregation Pattern
MS aggregator are useful when there is a need to look at many data types. For example, we may need to look at 140 MS to see what a customer need. Instead of requesting 140 MS, we request the aggregator for all the information. This aggregator has a cache of all the data from the other MS updated possibly via a queue. It is a band-aid pattern though. Each request has its own aggregator. An aggregator is stateful.
Adapter pattern
MS adapters are new endpoint to communicate with other systems which may not be able to communicate through REST requests. Each adapter is dedicated to a specific system. For example, the adapter convert the REST request into a call to a COBOL system.
Challenge #6: How to maintain data consistency in distributed transactions?
Sagas can be used to compensate a failed transaction. More information on: https://microservices.io/patterns/data/saga.html