Brief Overview of Engineering Design Patterns

In platform engineering, design patterns are crucial for creating scalable, efficient, and maintainable platforms. These patterns help address common challenges in system design, such as service scalability, data consistency, and fault tolerance.

These patterns are not mutually exclusive and are often used in combination to address the complex requirements of modern platform engineering. The key is to understand the specific needs and constraints of your platform to choose the most appropriate patterns.

Here are some common design patterns used in engineering:

Microservices Architecture

Microservices Architecture is a distinctive method of developing software systems that has grown in popularity recently. This approach structures an application as a collection of loosely coupled services, in contrast to traditional monolithic architecture, where all components of an application are interwoven and deployed as a single unit.

Key Characteristics of Microservices Architecture

Modularity - Applications are broken down into smaller, manageable pieces (microservices). Each microservice is focused on a specific function or business capability.

Independence - Microservices are developed, deployed, and maintained independently. This allows for faster development cycles and easier deployment and scaling.

Heterogeneity - Different microservices can be written in different programming languages, use different data storage technologies, and be managed by different teams. This offers flexibility in choosing the right tool for the right task.

Distributed Development - Since microservices are loosely coupled, teams can work on different services simultaneously without significant dependency on each other.

Resilience - The failure of one microservice does not necessarily bring down the entire application, making the system more resilient to individual component failures.

Scalability - Individual components can be scaled independently based on demand. This is more efficient than scaling the entire application as in a monolithic architecture.

Communication - Microservices typically communicate with each other through well-defined APIs, often using lightweight protocols such as HTTP/REST or messaging queues.

API Gateway

Acts as a single entry point for all clients. The API gateway routes requests to the appropriate microservice and aggregates the results. It's beneficial for managing cross-cutting concerns like authentication, logging, and SSL termination.

Request Routing - The API Gateway routes incoming requests to the appropriate microservices based on the request path, method, and other attributes.

Aggregation - It can aggregate responses from multiple microservices and return them as a unified response to the client. This reduces the number of round-trip calls needed between the client and server.

Authentication and Authorization - The gateway often handles authentication and authorization tasks, ensuring that requests are validated before being forwarded to underlying services.

Rate Limiting and Throttling - To protect the backend services from being overwhelmed, the gateway can enforce rate limiting and throttling policies.

Load Balancing - It distributes incoming requests across multiple instances of a microservice, improving the system's overall performance and reliability.

Caching - Responses from backend services can be cached at the gateway to improve response times and reduce the load on backend services.

Logging and Monitoring - The API Gateway can log requests and responses, which is helpful for debugging, monitoring, and auditing purposes.

Circuit Breaker

This pattern prevents a network or service failure from cascading to other system parts. When certain thresholds for failure are met, the circuit breaker “opens” to stop requests to the failing service, allowing it time to recover.

How the Circuit Breaker Pattern Works

Closed State - Initially, the circuit breaker is in the 'Closed' state, allowing calls to pass through to the underlying service. The circuit breaker monitors for failures during this state.

Open State - If the number of failures exceeds a predefined threshold within a certain period, the circuit breaker trips, transitioning to the 'Open' state. In this state, it blocks all attempts to invoke the service, which helps to prevent further strain on an already failing service and gives it time to recover.

Half-Open State - After a predefined timeout period, the circuit breaker switches to the 'Half-Open' state. In this state, a limited number of test requests can pass through. If these requests are successful, it's assumed that the fault is resolved, and the circuit breaker returns to the 'Closed' state. If these requests fail, it returns to the 'Open' state.

Benefits of the Circuit Breaker Pattern

Resilience - Enhances the system's resilience by preventing a single service failure from escalating into a complete system failure.

Fail-Fast Mechanism - Provides a rapid response to the failure scenario, allowing the system to fail fast and recover instead of making clients wait for a timeout.

Reduced Load on Dependent Services - Prevents overwhelming a failing service with additional requests, giving it time to recover.

Improved System Stability - Protects the overall system stability by isolating faulty services and preventing them from causing cascading failures.

Event-Driven Architecture

In this pattern, events (e.g., user actions or system updates) trigger the services to react. This is particularly effective in systems where data changes frequently and needs to be reflected across different components.

Events - Central to EDA, an event is a significant change in the system's state or a specific occurrence that the system should know about.

Event Producers - These are sources of events. They generate events and send them to the event processing system. Examples include user interfaces, sensors, or other systems.

Event Consumers - They listen for and process events. They react to the events they are interested in, which can trigger a series of business processes or workflows.

Event Channels - This is the medium through which events are delivered from producers to consumers. It could be a message queue, a broker, or a simpler event stream.

Event Processing - It involves handling the event, which could be as simple as logging or as complex as triggering multiple downstream processes.

Load Balancer

Distributes network or application traffic across multiple servers to ensure no single server bears too much demand. This improves the responsiveness and availability of applications.

Distribution of Requests -The load balancer efficiently distributes incoming traffic and requests across multiple servers or resources.

Health Checks - Regularly checks the health of servers to ensure traffic is only sent to operational and responsive servers.

Types of Load Balancing:

  • Round Robin - Distributes requests sequentially among the servers.
  • Least Connections - Directs traffic to the server with the fewest active connections.
  • IP Hash - Uses the IP address of the client to determine which server receives the request, ensuring a user consistently connects to the same server.

Scalability - Load balancers can be scaled to manage increased load, either through adding more resources to the load balancer itself or by integrating with auto-scaling groups that dynamically adjust the number of servers.

Redundancy - Often deployed in pairs or clusters to ensure the load balancer itself does not become a single point of failure.

Improved Application Availability and Reliability - By distributing traffic across multiple servers, load balancers enhance the overall availability and reliability of the application.

Scalability: Facilitates easy scaling of applications to handle increased traffic.

Performance Optimization: Balances the load to prevent any single server from becoming a bottleneck, optimizing the application's performance.

Redundancy: Provides redundancy, ensuring continuous service availability even if one or more servers go down.

Service Discovery

In a dynamic environment with microservices, service discovery patterns are used to automatically detect network locations for service instances, which could have dynamically assigned addresses due to auto-scaling, failures, or upgrades.

Dynamic Service Registration - Services automatically register themselves with a service discovery mechanism when they start. This registration typically includes the service's name, IP address, port, and potentially other metadata.

Service Lookup - When a service needs to communicate with another service, it queries the service discovery mechanism to find the location of the target service.

Client-Side vs. Server-Side Discovery:

  • Client-Side Discovery - Clients are responsible for determining the network locations of available service instances and load-balancing requests across them.
  • Server-Side Discovery - A router or load balancer performs the service discovery and routes each request to an available service instance.

Health Checking - Regular health checks are performed on services to ensure they function correctly. Unhealthy services are removed or marked as unavailable.

Dynamic Management - Automatically manages the network locations of services, which is especially useful in cloud environments where instances may be frequently created and destroyed.

Improved Scalability - Simplifies scaling services up or down, as new instances can be automatically discovered and utilized without manual configuration.

Resilience - Enhances the overall resilience of the system by routing around failed or overwhelmed service instances.

Load Balancing - Often works in conjunction with load balancing, distributing requests effectively across multiple service instances.

Backends for Frontends (BFF)

Image Source: https://raw.githubusercontent.com/abpio/abp-commercial-docs/rel-4.3/en/images/gateway-bff.png

This pattern involves creating separate backend services tailored for different front-end applications, such as mobile apps and web apps. It allows each frontend to have a backend that is optimized for its specific needs.

Client-Specific Backend Services: Each BFF is a unique backend service dedicated to a specific client type.  For example, one BFF for web clients, another for mobile clients, and so on.

Customized APIs: Each BFF exposes an API that is specifically designed for the needs and characteristics of its corresponding client. This allows for optimizing the data and interactions based on what that client type requires.

Isolation of Client Logic: By segregating the backend for each type of frontend, the system isolates the changes and special logic required by each client. This reduces the impact of changes in one client type on others.

Simplification of Client Code: BFFs can handle tasks that would otherwise be duplicated across client types, such as authentication, authorization, and data aggregation, simplifying the client application's code.

Caching

Image Source: https://orkhanscience.medium.com/upgrade-performance-via-caching-5-min-read-19fafd56d704

Frequently accessed data is stored temporarily in a cache to speed up data retrieval. This reduces the load on the database or external service and improves the overall performance of the system.

Data Storage - Caching involves storing data in a temporary storage area known as a cache. This data is usually a subset of a more extensive data set that is expensive to fetch or compute.

Cache Hit and Miss - A cache hit occurs when requested data is found in the cache, significantly improving response time. A cache miss occurs when the data is not found in the cache, necessitating a fetch from the slower primary storage.

Data Consistency - Ensuring that cached data remains consistent with the source data. This can be challenging in systems where the underlying data changes frequently.

Eviction Policies - These are rules that determine which data to remove from the cache when it becomes full, such as Least Recently Used (LRU), First In First Out (FIFO), or Least Frequently Used (LFU).

Time-to-Live (TTL) - A mechanism that defines how long data should remain in the cache before being refreshed or discarded.

Sidecar Pattern

Image Source: https://distributedsystemsmadeeasy.medium.com/sidecar-pattern-architectural-patterns-84645060c1f

The Sidecar pattern is a popular design pattern in the realm of microservices and cloud-native architectures. It addresses the need for managing and supporting the operations of primary service containers by deploying an additional helper container – the "sidecar" – alongside them. This pattern allows for the separation of concerns, as the sidecar container handles aspects like monitoring, logging, configuration, and network traffic control, which are orthogonal to the primary business logic of the main container.  Primarily in Kubernetes

Companion Container - The sidecar is a secondary container that runs alongside the main container (the primary application) in the same Kubernetes Pod or similar orchestration unit.

Single Responsibility - The sidecar container is typically responsible for a specific aspect, such as logging, monitoring, configuration updates, networking tasks, or security, allowing the main container to focus on application-specific tasks.

Shared Resources - The sidecar and the primary container can share specific resources, like filesystem volumes and network namespaces, enabling them to communicate closely and efficiently.

Isolation - The sidecar pattern keeps the auxiliary functionalities separate from the main application logic. This isolation simplifies the development and maintenance of the main application.

Observer Pattern

Image Source: https://steemit.com/design-patterns/@slawas/design-patterns-observer

This pattern is used for implementing distributed event-handling systems. Observers subscribe to a subject and get notified when the subject issues an event, which is a common scenario in microservices.

Subject - This is the core of the pattern, often referred to as the observable. It maintains a list of its dependents, called observers, and notifies them of any state changes.

Observers - These entities wish to be notified about changes or events in the subject. They register with the subject and are updated automatically when something of interest occurs.

Registration Mechanism - Observers subscribe to the subject to receive updates. This creates a dynamic relationship between the subject and its observers.

Notification - When a relevant event occurs or the subject’s state changes, it sends a notification to all its observers.

Loose Coupling - The subject and observers are loosely coupled. The subject knows nothing about an observer other than that it implements a particular interface.

Command Query Responsibility Segregation (CQRS)

Image Source: https://henriquesd.medium.com/the-command-and-query-responsibility-segregation-cqrs-pattern-16cb7704c809

Command Query Responsibility Segregation (CQRS) is an architectural pattern in software engineering that separates the operations that modify data (commands) from the operations that read data (queries). This separation allows each side to be optimized independently, improving performance, scalability, and maintainability.

Commands - These are operations that modify the system's state or the data. Examples include creating, updating, or deleting data. Commands are executed against a command model or a write model.

Queries - These operations retrieve data from the system but do not change its state. Queries are executed against a query model or a read model.

Separate Models for Read and Write - In CQRS, the data models for reading and writing information are separated. This means that the model you read from can be optimized for read operations, and the model you write to can be optimized for write operations.

Event Sourcing - Often, CQRS is used in conjunction with Event Sourcing, where changes to the system's state are stored as a sequence of events. This allows the system to reconstruct past states easily and provides a natural way to separate reads and writes.

Database Sharding

Image Source: https://hazelcast.com/glossary/sharding/

It involves splitting a database into smaller, faster, more easily managed parts called shards. Each shard is a separate database, and collectively, the shards make up a single logical database.

Shard - A shard is a horizontal partition in a database. Each shard holds a portion of the data and operates independently from other shards.

Sharding Key - This is a specific attribute or set of attributes used to determine how data is distributed across the shards. Choosing a sharding key is crucial for achieving a balanced data distribution.

Horizontal Partitioning - Sharding is a form of horizontal partitioning where rows of a database table are divided across multiple databases. This is different from vertical partitioning, where different table columns are separated.

Data Distribution - The distribution of data can be done in several ways, such as range-based sharding, hash-based sharding, or list-based sharding, each with its own advantages and use cases.

Saga Pattern

Image Source: https://learn.temporal.io/tutorials/php/booking_saga/

The Saga Pattern is a design pattern used to manage transactions and maintain consistency across multiple microservices in distributed systems. In traditional monolithic applications, a single database transaction can handle the consistency of data changes. However, in a microservices architecture, where each service has its own database, achieving consistency across these services becomes a challenge. The Saga Pattern addresses this challenge by breaking down a transaction into a series of local transactions, each executed within its own microservice.

Local Transactions - A saga consists of multiple local transactions, where each transaction is scoped to a single service and its database.

Compensation Transactions - For each local transaction, there is a corresponding compensation transaction. A compensation transaction is executed if a subsequent step in the saga fails to undo the changes made by the previous transactions.

Choreography vs Orchestration:

  • Choreography - Each service in the saga knows about the next service to call. There is no central coordination, and services communicate with each other using events.
  • Orchestration - A central orchestrator (which can be a service itself) is responsible for directing the saga's execution, telling the participant services to execute transactions and handle compensations.

Proxy Structural Design Pattern

Image Source: https://en.wikipedia.org/wiki/Proxy_pattern

The Proxy Pattern is a structural design pattern in software engineering, where a class represents the functionality of another class. This pattern involves creating a proxy object that serves as an interface to something else. The proxy could interface with anything: a network connection, a large object in memory, a file, or some other resource that is expensive or impossible to duplicate. In essence, a proxy is a wrapper or agent object that the client is calling to access the real serving object behind the scenes.

Proxy Class - This class represents the functionality of another class. It maintains a reference to an object of the actual class and can control access to it.

Real Subject - This is the original object that the proxy represents.

Client - The client interacts with the Proxy object, believing it to be the Real Subject.

Types of Proxies:

  • Virtual Proxy: Delays the creation and initialization of expensive objects until needed.
  • Remote Proxy: Provides a local representation for an object that resides in a different address space (e.g., on a remote server).
  • Protective Proxy: Controls access to a sensitive master object, adding a security layer.
  • Smart Proxy: Adds additional behaviour (like access control, logging, or locking) when an object is accessed.