Tuesday, 16 June 2026

Chapter 1: SONiC Fundamentals

Introduction

SONiC (Software for Open Networking in the Cloud) is a Linux-based open-source network operating system that was originally developed at Microsoft and is now maintained by a broader open-source community. Its core idea is that the same network operating system can run on switch platforms from multiple hardware vendors. This reduces vendor lock-in and provides a more consistent operational model across different environments.

SONiC can also be viewed as an abstraction layer between network operators and the underlying switch hardware. Instead of learning and managing several vendor-specific operating systems, operators can use a common software architecture and management model across different switch platforms. This simplifies network operations, automation, monitoring, and telemetry collection. It can also reduce operational errors caused by configuration differences between platforms and make it easier to onboard new engineers.

Organizations can choose the hardware platform that best meets their technical, operational, and business requirements without being tied to a single software ecosystem. Some vendors provide commercially supported SONiC distributions together with professional support services, while others support community-based deployments or customer-tailored implementations. The appropriate model depends on the organization's operational requirements and support expectations.

From an architectural perspective, SONiC is a modular and container-based system. Major functional components, such as routing, switching, platform management, and monitoring, run in dedicated containers. Configuration, operational state, and system events are exchanged primarily through Redis databases and associated publisher-subscriber mechanisms. Because many system functions communicate through these databases, SONiC is often described as a database-oriented network operating system.

This architecture separates common network software functions from platform-specific implementation details. Control-plane applications generate information that is stored in the application databases and later used to program forwarding behavior in the underlying hardware. The result is a flexible software architecture that can support multiple hardware platforms while maintaining a consistent operational model.

Although SONiC provides a common software architecture, the same software image cannot run on every switch platform without platform-specific support. During startup, SONiC must identify the platform on which it is running, discover available hardware components, and learn their capabilities. This information is typically provided through platform-specific EEPROM data, platform drivers, and hardware management interfaces.

Many platforms use the I²C bus for functions such as hardware inventory collection, transceiver access, thermal monitoring, and power management. Depending on the platform design, some hardware management functions may also involve a Baseboard Management Controller (BMC). The implementation details vary between vendors, but the overall goal remains the same: providing SONiC with accurate information about the hardware resources available on the switch.


SONiC Microservice Architecture Overview

SONiC is a modular network operating system in which the main switch functions are divided into separate Docker containers. These functions include routing, neighbor discovery, link aggregation, monitoring, platform management, database services, switch state orchestration, and synchronization with the hardware abstraction layer. The containers run in Linux user space and together form SONiC's microservice-based architecture.

Inter-container communication is largely based on a centralized Redis database service, which runs in its own database container. Containers publish their service-specific information to Redis databases and subscribe to the changes they need. In this role, Redis acts as SONiC's internal distribution point for state information, configuration, and events. The database container can be understood as providing both state storage and event distribution services to the other containers, although Redis itself is not merely a message queue but also a central part of SONiC's data model.

Figure 1-1 summarizes this modular structure and shows how the main containers relate to Linux user space, Linux kernel space, and the underlying hardware platform. The Docker containers running in user space can be roughly grouped according to the components they communicate with and the role they play in the overall SONiC architecture.

One group consists of application-facing containers such as SNMP, LLDP, teamd, and BGP. These containers communicate either with external systems or with neighboring network devices. The SNMP container handles SNMP queries and responses exchanged with external management systems. The LLDP container is responsible for discovering neighboring devices and exchanging capability information. The teamd container manages link aggregation and logical port channels, while the BGP container exchanges routing information with its routing peers. These containers publish the information they generate to Redis databases, where other SONiC components can read it. 

The upper part of the figure also shows the dhcp-relay container, whose role is to relay DHCP messages when the DHCP server is located in a different subnet from the client device.

Another group consists of infrastructure-oriented containers such as pmon, swss, and syncd. These containers are more closely related to Linux kernel-space drivers, platform-specific hardware management, and switch ASIC programming. Through these containers, SONiC gathers information about components such as fans, power supplies, LEDs, optical transceivers, and other hardware elements. At the same time, they participate in the process in which higher-level control information is eventually translated into hardware state that can be programmed into the ASIC.

Among these containers, swss (Switch State Service) has a particularly central role. It acts as an intermediate layer between application logic and the state that is eventually programmed into the hardware. For example, the BGP container may publish routes to the APPL_DB database. The orchagent component in the swss container reads this information, converts it into a form suitable for ASIC programming, and publishes the result to the ASIC_DB database.

In the final stage, the syncd container reads the information published to ASIC_DB and passes it through the SAI interface to the vendor's ASIC SDK. The vendor's ASIC SDK, together with the platform-specific ASIC driver, programs the required state into the physical switch ASIC. In this way, a route learned from a BGP neighbor passes through several SONiC components and database layers before it is finally programmed into the hardware forwarding table.

It is also worth noting that not all SONiC components are containerized. Some configuration-related tools, such as the SONiC CLI and configuration generation logic, run directly on the Linux host system.

The key point is that SONiC does not rely on one large monolithic network process. Instead, it separates major functions into containers and connects them through a shared database model built on Redis. This separation makes the system easier to extend, monitor, restart, and adapt to different switch platforms.

Figure 1-1: SONiC Micro-Service Architecture Overview


SONiC Database-Oriented Design


The Redis server runs inside the Database container. Figure 1-2 shows a single-instance Redis model used in SONiC, where a single Redis server hosts several logical databases for different purposes. Redis provides a lightweight in-memory data store with fast access and simple inter-process communication, making it well suited for SONiC's modular container-based architecture.

APPL_DB stores application-level state, such as route entries published by the BGP container. ASIC_DB stores SAI-oriented objects that represent the state to be programmed into the switch ASIC.
Although APPL_DB and ASIC_DB may contain information related to the same network function, such as routing, they do not communicate directly with each other. Instead, the swss container provides the orchestration logic that connects application-level state to hardware-oriented state. The central component in this process is orchagent, which monitors relevant APPL_DB tables, processes updates, and writes the corresponding SAI-level objects to ASIC_DB.

A simplified route update flow works as follows. When the BGP container learns, updates, or withdraws a route, its synchronization logic publishes the route update to APPL_DB. orchagent receives the update from APPL_DB, translates the route information into ASIC-programming intent, and writes the result to ASIC_DB. In this role, orchagent acts both as a consumer of application-level updates and as a producer of hardware-oriented database entries.

After the route-related objects have been written to ASIC_DB, syncd consumes the ASIC_DB updates and passes them through the SAI interface to the vendor ASIC SDK. The vendor ASIC SDK, together with the platform-specific driver stack, then programs the required forwarding state into the physical switch ASIC. This example illustrates the producer-consumer pattern used throughout SONiC's database-oriented architecture. 

In practice, database-oriented means that SONiC services exchange much of their configuration, operational state, and event information through Redis databases rather than through direct service-to-service calls.

Other key SONiC databases include CONFIG_DB, STATE_DB, and COUNTERS_DB. CONFIG_DB stores the switch configuration. During startup, it is populated from JSON-formatted configuration data, while subsequent updates may originate from SONiC management interfaces. STATE_DB stores operational state reported by SONiC processes, while COUNTERS_DB stores counters and telemetry-related statistics. These databases are discussed in more detail in later sections.
The relationship between these logical databases in the single-instance Redis model is illustrated in Figure 1-2.

Figure 1-2: Single-Instance Redis Database Model in SONiC.

In the single-instance model, different containers publish and consume data through logical databases that share the same Redis process, CPU resources, and UNIX socket. Under heavy load, such as during large route update bursts, this shared model can create contention and delay database operations for other functions. To reduce this risk in larger or more demanding deployments, SONiC also supports a multi-instance Redis architecture.

In the multi-instance model, databases can be organized according to workload characteristics such as read/write frequency and CPU utilization. Figure 1-3 illustrates one possible grouping of Redis instances based on workload characteristics. APPL_DB and ASIC_DB are shown in a High-Churn Processing Instance, CONFIG_DB and STATE_DB in a Management and Slow-State Instance, and COUNTERS_DB in a High-Frequency Telemetry Instance. This separation allows each Redis instance to expose its own UNIX socket and also allows deployments to apply CPU-affinity policies when needed. As a result, high-frequency telemetry collection or large route-update bursts are less likely to interfere with slower management and state operations.


Figure 1-3: Multi-Instance Redis Database Model in SONiC.

SONiC's architecture is built around modular services, shared databases, and a hardware abstraction layer that separates common network functions from platform-specific implementation details. Understanding these fundamentals makes it easier to follow later topics such as installation, startup, hardware discovery, configuration handling, and operational troubleshooting.



No comments:

Post a Comment