Tuesday, 23 June 2026

Chapter 2: Installing SONiC NOS

 

ONIE-Based SONiC Installation

Many switch vendors have added SONiC NOS support to at least part of their switch portfolio. Depending on the vendor and switch model, customers may be able to order a switch with a vendor-customized SONiC version that is supported at the same level as the vendor's own network operating system. Some vendors also allow customers to run the community-based SONiC distribution.

The support model for Community SONiC depends on the vendor. Some hardware vendors provide full support, while others provide no support at all. Compared with vendor-specific SONiC distributions, Community SONiC provides greater flexibility because it can be customized, rebuilt, and adapted to customer requirements. However, running a Community SONiC deployment without vendor support or in-house expertise is generally not a recommended operating model.

Community SONiC is typically installed by using ONIE (Open Network Install Environment) [1], a small open-source installation environment that provides a standardized method for installing network operating systems on supported switches. Figure 2-1 illustrates a conceptual ONIE-based Community SONiC installation process.

If the switch is delivered with a vendor-specific SONiC distribution already installed, it may boot directly into that operating system without requiring a separate ONIE installation workflow. For Community SONiC deployments, the SONiC installation image is downloaded from a trusted source such as a vendor repository or the Community SONiC project. Depending on the installation environment, the image may be stored on a USB flash drive, an HTTP server, or a TFTP server. In this example, the image is stored on an HTTP server.

During the DHCP process, the switch receives network parameters and may also receive information that assists installer discovery, such as an installer URL or TFTP-related parameters. In the HTTP-based path shown in Figure 2-1, ONIE downloads the installer image from the HTTP server and launches it. The installer then performs the SONiC installation, after which the switch reboots into the newly installed SONiC operating system.


Figure 2-1: Conceptual ONIE Installation Workflow – ZTP.

Next, let us examine the automatic SONiC installation process in more detail. Example 2-1 shows a simple dnsmasq-based DHCP configuration that is used by the ONIE installation workflow. The service is bound to the interface facing the target switches, and the DHCP pool assigns addresses from 192.168.1.50 to 192.168.1.150 with a 12-hour lease time. The same configuration also provides the default gateway and DNS server address for the management subnet.

For this example, the most important entry is DHCP option 114. It provides ONIE with an explicit installer URL that points to the SONiC ONIE installer image on the HTTP server. The example uses the platform-derived filename onie-installer-x86_64-kvm_x86_64-r0, which matches the platform identifier shown later in the ONIE discovery output. The same filename is also used in DHCP option 67 as the TFTP bootfile name. The configuration also includes DHCP option 66, which identifies the TFTP server. In this example, both the HTTP and TFTP services are located on the same server.

When DHCP option 114 is present, ONIE typically attempts to use the provided installer URL directly. If a valid installer image is not available through that URL, ONIE can continue with its normal discovery mechanisms, including TFTP-based discovery.

  GNU nano 5.4                    /etc/dnsmasq.conf *                                    
# Bind explicitly to the interface facing your target switches (e.g., eth0 for Management)
interface=eth0
bind-interfaces
# DHCP IP Range: Pool starts at .50 and ends at .150 with a 12-hour lease
dhcp-range=192.168.1.50,192.168.1.150,255.255.255.0,12h
# Gateway and DNS Options
dhcp-option=option:router,192.168.1.1
dhcp-option=option:dns-server,1.1.1.1
# ONIE Provisioning URL (Matches Option 114 "default-url" requested by ONIE)
dhcp-option=114,"http://192.168.1.10/onie-installer-x86_64-kvm_x86_64-r0"
# Alternative TFTP discovery information for ONIE (Options 66 and 67)
dhcp-option=option:tftp-server,192.168.1.10
dhcp-option=option:bootfile-name,"onie-installer-x86_64-kvm_x86_64-r0"
# Logging for deep troubleshooting
log-dhcp
^G Help      ^O Write Out ^W Where Is  ^K Cut       ^T Execute   ^C Location
^X Exit      ^R Read File ^\ Replace   ^U Paste     ^J Justify   ^_ Go To Line

Example 2-1: DHCP Server Example Configuration.

When the SONiC switch is powered on, it starts the platform bootloader. In this example, the bootloader is GRUB (Grand Unified Bootloader). The GRUB menu contains the installed network operating system entries and the ONIE entry. Selecting the *ONIE menu entry does not start the SONiC installer directly. It opens the ONIE menu, where the operator selects the ONIE boot mode used for installation, rescue, uninstall, update, or embed operations.

                             GNU GRUB  version 2.02                           
+----------------------------------------------------------------------------+
|VendorX-default-nos-v1.0                                                    |
|VendorX-hardened-SONiC-v1.0                                                 |
| *ONIE                                                                      |
|                                                                            |
| <empty lines snipped>                                                      |
|                                                                            |
|                                                                            |
+----------------------------------------------------------------------------+
      Use the ^ and v keys to select which entry is highlighted.
      Press enter to boot the selected OS, `e' to edit the commands
      before booting or `c' for a command-line.
   The highlighted entry will be executed automatically in 2s.

Example 2-2: ONIE Installer Phase 1 – Select “*ONIE”

After the ONIE entry has been selected, the ONIE menu is displayed. The automatic SONiC installation process is started by selecting the “*ONIE: Install OS” option, as shown in Example 2-3. This mode starts ONIE with the discovery service enabled, allowing ONIE to search for a network operating system installer automatically. Before starting the process, it is good practice to verify that the device has enough storage space for the SONiC installation. If the existing network operating system must be removed, this can be done by selecting the ONIE: Uninstall OS option.


                           GNU GRUB  version 2.02                              
 +----------------------------------------------------------------------------+
 |*ONIE: Install OS                                                           |
 | ONIE: Rescue                                                               |
 | ONIE: Uninstall OS                                                         |
 | ONIE: Update ONIE                                                          |
 | ONIE: Embed ONIE                                                           |
 |                                                                            |
 | <empty lines snipped>                                                      |
 |                                                                            |
 |                                                                            |
 +----------------------------------------------------------------------------+
      Use the ^ and v keys to select which entry is highlighted.
      <Rest of the text removed for brevity>

Example 2-3: ONIE Installer Phase 2 – Select “ONIE: Install OS”.

Example 2-4 shows the ONIE-side view of this process after the process starts. The purpose of Example 2-4 is not to show a complete successful installation, but to show what ONIE does after installation mode has been selected. ONIE initializes the platform, discovers available network interfaces, attempts DHCP on interfaces with link, and then begins service discovery.

In this output, ONIE receives an IPv4 address on eth0, but it does not retrieve a valid installer image. Instead, it continues into TFTP-based service discovery and tries several candidate installer filenames. After the discovery attempt fails, ONIE waits for 20 seconds and starts the discovery cycle again. This behavior is important to understand because ONIE discovery is a continuous loop that repeats until a valid installer is found and executed.

Note: Some messages shown in the example originate from the virtual lab platform used to generate the output and are not part of a normal hardware installation.

# Step 1: ONIE enters OS install mode and prints platform information.
ONIE: OS Install Mode ...
Platform  : x86_64-kvm_x86_64-r0
Version   : master-201811170418
Build Date: 2018-11-17T04:18+00:00
# Step 2: ONIE mounts kernel filesystems and initializes the platform environment.
Info: Mounting kernel filesystems... done.
ERROR: Getting ONIE boot device timeout
Info: BIOS mode: legacy
Running demonstration platform init pre_arch routines...
Running demonstration platform init post_arch routines...
grub-editenv: error: cannot open `/mnt/onie-boot/grub/grubenv.new': No such file or directory.
Info: Making NOS install boot mode persistent.
Installing for i386-pc platform.
grub-install: error: failed to get canonical path of `rootfs'.
ERROR: grub-install failed on:
network_driver: Running demonstration pre_init routines...
network_driver: Running ASIC/SDK init routines...
network_driver: Running demonstration post_init routines...
# Step 3: ONIE initializes network interfaces and learns their MAC addresses.
Info: Using eth0 MAC address: 0c:9a:32:79:00:00
Info: Using eth1 MAC address: 0c:9a:32:79:00:01
Info: Using eth2 MAC address: 0c:9a:32:79:00:02
Info: Using eth3 MAC address: 0c:9a:32:79:00:03
Info: Using eth4 MAC address: 0c:9a:32:79:00:04
Info: Using eth5 MAC address: 0c:9a:32:79:00:05
Info: Using eth6 MAC address: 0c:9a:32:79:00:06
Info: Using eth7 MAC address: 0c:9a:32:79:00:07
Info: Using eth8 MAC address: 0c:9a:32:79:00:08
Info: Using eth9 MAC address: 0c:9a:32:79:00:09
# Step 4: eth0 has link, so ONIE starts DHCPv4 and receives an IPv4 address.
Info: eth0:  Checking link... up.
Info: Trying DHCPv4 on interface: eth0
ONIE: Using DHCPv4 addr: eth0: 192.168.1.148 / 255.255.255.0
# Step 5: ONIE checks the remaining interfaces. Interfaces without DHCP either use link-local addressing or are skipped if the link is down.
Info: eth1:  Checking link... up.
Info: Trying DHCPv4 on interface: eth1
Warning: Unable to configure interface using DHCPv4: eth1
ONIE: Using link-local IPv4 addr: eth1: 169.254.186.223/16
Info: eth2:  Checking link... up.
Info: Trying DHCPv4 on interface: eth2
<Additional interface checks omitted for brevity.>
# Step 6: ONIE starts basic services and starts its discovery logic for locating an installer image.
Starting: klogd... done.
Starting: dropbear ssh daemon... done.
Starting: telnetd... done.
discover: installer mode detected.  Running installer.
Starting: discover... done.
Please press Enter to activate this console. Info: eth0:  Checking link... up.
Info: Trying DHCPv4 on interface: eth0
ONIE: Using DHCPv4 addr: eth0: 192.168.1.148 / 255.255.255.0
Info: eth1:  Checking link... up.
Info: Trying DHCPv4 on interface: eth1
Warning: Unable to configure interface using DHCPv4: eth1
ONIE: Using link-local IPv4 addr: eth1: 169.254.152.102/16
Info: eth2:  Checking link... up.
<Info related to eth2-8 is removed for brevity>
Info: Trying DHCPv4 on interface: eth9
Warning: Unable to configure interface using DHCPv4: eth9
ONIE: Using link-local IPv4 addr: eth9: 169.254.28.73/16
# Step 7: ONIE starts service discovery and tries TFTP-based candidate installer paths. The platform-specific names are derived from the platform identifier printed earlier, such as x86_64-kvm_x86_64-r0. In this lab example, no installer image is available on either the HTTP or TFTP server, so ONIE eventually repeats the discovery cycle.
ONIE: Starting ONIE Service Discovery
Info: Attempting tftp://onie-server/0c-9a-32-79-00-00/onie-installer-x86_64-kvm_x86_64-r0 ...
Info: Attempting tftp://onie-server/onie-installer-x86_64-kvm_x86_64-r0 ...
Info: Attempting tftp://onie-server/onie-installer-x86_64-kvm_x86_64-r0.bin ...
Info: Attempting tftp://onie-server/onie-installer-x86_64-kvm_x86_64 ...
Info: Attempting tftp://onie-server/onie-installer-x86_64-kvm_x86_64.bin ...
Info: Attempting tftp://onie-server/onie-installer-kvm_x86_64 ...
Info: Attempting tftp://onie-server/onie-installer-kvm_x86_64.bin ...
Info: Attempting tftp://onie-server/onie-installer-x86_64-qemu ...
Info: Attempting tftp://onie-server/onie-installer-x86_64-qemu.bin ...
Info: Attempting tftp://onie-server/onie-installer-x86_64 ...
Info: Attempting tftp://onie-server/onie-installer-x86_64.bin ...
Info: Attempting tftp://onie-server/onie-installer ...
Info: Attempting tftp://onie-server/onie-installer.bin ...
Info: Sleeping for 20 seconds
4..3..2..1..
# Step 8: No installer was found, so ONIE starts a new discovery cycle.
Info: eth0:  Checking link... up.

Example 2-4: ONIE Installer Phase 3 – Discovery Loop.

Platform information is important because ONIE uses it during automatic discovery to build installer filenames that match the target hardware. In this example, the platform identifier x86_64-kvm_x86_64-r0 appears in candidate names such as onie-installer-x86_64-kvm_x86_64-r0. ONIE follows a waterfall-style search. It first tries the most specific names, including MAC- and platform-based paths, and then falls back to more generic names such as onie-installer and onie-installer.bin. This helps ONIE find a suitable platform-specific installer while still allowing generic fallback options.

Example 2-5 shows the corresponding DHCP server-side events from dnsmasq. When reading the output, focus on four things: the ONIE vendor and user class information, the DHCP Discover/Offer/Request/Acknowledgement exchange, the address assigned to the ONIE client, and the TFTP-related bootfile and server information returned by the DHCP server. The detailed relationship between this output and the ONIE-side output in Example 2-4 is discussed after the example.

# Step A: dnsmasq is started in foreground mode with the configuration from Example 2-1.
root@sonic:~# sudo dnsmasq -d -C /etc/dnsmasq.conf
dnsmasq: started, version 2.85 cachesize 150
dnsmasq: compile time options: IPv6 GNU-getopt DBus no-UBus i18n IDN2 DHCP DHCPv6 no-Lua TFTP conntrack ipset auth cryptohash DNSSEC loop-detect inotify dumpfile
dnsmasq-dhcp: DHCP, IP range 192.168.1.50 -- 192.168.1.150, lease time 12h
dnsmasq-dhcp: DHCP, sockets bound exclusively to interface eth0
dnsmasq: reading /etc/resolv.conf
dnsmasq: using nameserver 192.168.1.10#53
dnsmasq: read /etc/hosts - 2 addresses
# Step B: ONIE sends DHCP messages from eth0 and identifies itself with ONIE vendor and user class information.
dnsmasq-dhcp: 1021583124 available DHCP range: 192.168.1.50 -- 192.168.1.150
dnsmasq-dhcp: 1021583124 vendor class: onie_vendor:x86_64-kvm_x86_64-r0
dnsmasq-dhcp: 1021583124 user class: onie_dhcp_user_class
# Step C: DHCP Discover and Offer correspond to ONIE trying DHCPv4 on eth0 in Example 2-4.
dnsmasq-dhcp: 1021583124 DHCPDISCOVER(eth0) 0c:9a:32:79:00:00
dnsmasq-dhcp: 1021583124 tags: eth0
dnsmasq-dhcp: 1021583124 DHCPOFFER(eth0) 192.168.1.148 0c:9a:32:79:00:00
dnsmasq-dhcp: 1021583124 requested options: 1:netmask, 3:router, 6:dns-server, 7:log-server,
dnsmasq-dhcp: 1021583124 requested options: 12:hostname, 15:domain-name, 28:broadcast,
dnsmasq-dhcp: 1021583124 requested options: 42:ntp-server, 119:domain-search
dnsmasq-dhcp: 1021583124 bootfile name: onie-installer-x86_64-kvm_x86_64-r0
dnsmasq-dhcp: 1021583124 server name: 192.168.1.10
dnsmasq-dhcp: 1021583124 next server: 192.168.1.10
dnsmasq-dhcp: 1021583124 sent size:  1 option: 53 message-type  2
dnsmasq-dhcp: 1021583124 sent size:  4 option: 54 server-identifier  192.168.1.10
dnsmasq-dhcp: 1021583124 sent size:  4 option: 51 lease-time  12h
dnsmasq-dhcp: 1021583124 sent size:  4 option: 58 T1  6h
dnsmasq-dhcp: 1021583124 sent size:  4 option: 59 T2  10h30m
dnsmasq-dhcp: 1021583124 sent size:  4 option:  1 netmask  255.255.255.0
dnsmasq-dhcp: 1021583124 sent size:  4 option: 28 broadcast  192.168.1.255
dnsmasq-dhcp: 1021583124 sent size:  4 option:  6 dns-server  1.1.1.1
dnsmasq-dhcp: 1021583124 sent size:  4 option:  3 router  192.168.1.1
dnsmasq-dhcp: 1021583124 available DHCP range: 192.168.1.50 -- 192.168.1.150
dnsmasq-dhcp: 1021583124 vendor class: onie_vendor:x86_64-kvm_x86_64-r0
dnsmasq-dhcp: 1021583124 user class: onie_dhcp_user_class
# Step D: DHCP Request and ACK complete the address assignment for 192.168.1.148.
dnsmasq-dhcp: 1021583124 DHCPREQUEST(eth0) 192.168.1.148 0c:9a:32:79:00:00
dnsmasq-dhcp: 1021583124 tags: eth0
dnsmasq-dhcp: 1021583124 DHCPACK(eth0) 192.168.1.148 0c:9a:32:79:00:00
dnsmasq-dhcp: 1021583124 requested options: 1:netmask, 3:router, 6:dns-server, 7:log-server,
dnsmasq-dhcp: 1021583124 requested options: 12:hostname, 15:domain-name, 28:broadcast,
dnsmasq-dhcp: 1021583124 requested options: 42:ntp-server, 119:domain-search
dnsmasq-dhcp: 1021583124 bootfile name: onie-installer-x86_64-kvm_x86_64-r0
dnsmasq-dhcp: 1021583124 server name: 192.168.1.10
dnsmasq-dhcp: 1021583124 next server: 192.168.1.10
dnsmasq-dhcp: 1021583124 sent size:  1 option: 53 message-type  5
dnsmasq-dhcp: 1021583124 sent size:  4 option: 54 server-identifier  192.168.1.10
dnsmasq-dhcp: 1021583124 sent size:  4 option: 51 lease-time  12h
dnsmasq-dhcp: 1021583124 sent size:  4 option: 58 T1  6h
dnsmasq-dhcp: 1021583124 sent size:  4 option: 59 T2  10h30m
dnsmasq-dhcp: 1021583124 sent size:  4 option:  1 netmask  255.255.255.0
dnsmasq-dhcp: 1021583124 sent size:  4 option: 28 broadcast  192.168.1.255
dnsmasq-dhcp: 1021583124 sent size:  4 option:  6 dns-server  1.1.1.1
dnsmasq-dhcp: 1021583124 sent size:  4 option:  3 router  192.168.1.1

Example 2-5: DHCP Events.

Example 2-4 and Example 2-5 show the same installation attempt from two different viewpoints. Example 2-4 is the ONIE console output from the switch, while Example 2-5 is the DHCP server output from dnsmasq. When ONIE prints the platform value x86_64-kvm_x86_64-r0, the same value appears in the DHCP log as the ONIE vendor class. This allows the DHCP server to recognize the requesting device as an ONIE client and, in more advanced deployments, to return platform-specific installer information.

The eth0 MAC address also matches between the two examples. In Example 2-4, ONIE reports eth0 with the MAC address 0c:9a:32:79:00:00. In Example 2-5, the DHCP server receives DHCPDISCOVER, DHCPOFFER, DHCPREQUEST, and DHCPACK messages for the same MAC address. The address offered and acknowledged by the DHCP server is 192.168.1.148, which is the same address that ONIE later shows as assigned to eth0.

 

Manual Installation

In a manual installation workflow, the SONiC ONIE installer image is first downloaded from a trusted source and placed on storage reachable by the switch, such as an HTTP server or local media. The switch is then booted into ONIE: Rescue mode. Rescue mode provides an interactive ONIE shell and does not rely on the automatic discovery loop used by ONIE Install Mode. From the ONIE shell, the operator manually configures network connectivity and starts the installation by running the following commands.

The first command stops any active ONIE discovery processes. The management interface is then configured manually, a default route is added, and the SONiC installer is downloaded and executed by using the onie-nos-install command.

ONIE:/# onie-stop
ONIE:/# ifconfig eth0 192.168.1.148 netmask 255.255.255.0
ONIE:/# ip route add default via 192.168.1.1
ONIE:/# onie-nos-install http://192.168.1.10/onie-installer-x86_64-kvm_x86_64-r0 

Example 2-6: Manual ONIE Installation Commands.

 


Tuesday, 16 June 2026

Chapter 1: SONiC Fundamentals

Introduction

SONiC (Software for Open Networking in the Cloud) is a Linux-based open-source network operating system that was originally developed at Microsoft and is now maintained by a broader open-source community. Its core idea is that the same network operating system can run on switch platforms from multiple hardware vendors. This reduces vendor lock-in and provides a more consistent operational model across different environments.

SONiC can also be viewed as an abstraction layer between network operators and the underlying switch hardware. Instead of learning and managing several vendor-specific operating systems, operators can use a common software architecture and management model across different switch platforms. This simplifies network operations, automation, monitoring, and telemetry collection. It can also reduce operational errors caused by configuration differences between platforms and make it easier to onboard new engineers.

Organizations can choose the hardware platform that best meets their technical, operational, and business requirements without being tied to a single software ecosystem. Some vendors provide commercially supported SONiC distributions together with professional support services, while others support community-based deployments or customer-tailored implementations. The appropriate model depends on the organization's operational requirements and support expectations.

From an architectural perspective, SONiC is a modular and container-based system. Major functional components, such as routing, switching, platform management, and monitoring, run in dedicated containers. Configuration, operational state, and system events are exchanged primarily through Redis databases and associated publisher-subscriber mechanisms. Because many system functions communicate through these databases, SONiC is often described as a database-oriented network operating system.

This architecture separates common network software functions from platform-specific implementation details. Control-plane applications generate information that is stored in the application databases and later used to program forwarding behavior in the underlying hardware. The result is a flexible software architecture that can support multiple hardware platforms while maintaining a consistent operational model.

Although SONiC provides a common software architecture, the same software image cannot run on every switch platform without platform-specific support. During startup, SONiC must identify the platform on which it is running, discover available hardware components, and learn their capabilities. This information is typically provided through platform-specific EEPROM data, platform drivers, and hardware management interfaces.

Many platforms use the I²C bus for functions such as hardware inventory collection, transceiver access, thermal monitoring, and power management. Depending on the platform design, some hardware management functions may also involve a Baseboard Management Controller (BMC). The implementation details vary between vendors, but the overall goal remains the same: providing SONiC with accurate information about the hardware resources available on the switch.


SONiC Microservice Architecture Overview

SONiC is a modular network operating system in which the main switch functions are divided into separate Docker containers. These functions include routing, neighbor discovery, link aggregation, monitoring, platform management, database services, switch state orchestration, and synchronization with the hardware abstraction layer. The containers run in Linux user space and together form SONiC's microservice-based architecture.

Inter-container communication is largely based on a centralized Redis database service, which runs in its own database container. Containers publish their service-specific information to Redis databases and subscribe to the changes they need. In this role, Redis acts as SONiC's internal distribution point for state information, configuration, and events. The database container can be understood as providing both state storage and event distribution services to the other containers, although Redis itself is not merely a message queue but also a central part of SONiC's data model.

Figure 1-1 summarizes this modular structure and shows how the main containers relate to Linux user space, Linux kernel space, and the underlying hardware platform. The Docker containers running in user space can be roughly grouped according to the components they communicate with and the role they play in the overall SONiC architecture.

One group consists of application-facing containers such as SNMP, LLDP, teamd, and BGP. These containers communicate either with external systems or with neighboring network devices. The SNMP container handles SNMP queries and responses exchanged with external management systems. The LLDP container is responsible for discovering neighboring devices and exchanging capability information. The teamd container manages link aggregation and logical port channels, while the BGP container exchanges routing information with its routing peers. These containers publish the information they generate to Redis databases, where other SONiC components can read it. 

The upper part of the figure also shows the dhcp-relay container, whose role is to relay DHCP messages when the DHCP server is located in a different subnet from the client device.

Another group consists of infrastructure-oriented containers such as pmon, swss, and syncd. These containers are more closely related to Linux kernel-space drivers, platform-specific hardware management, and switch ASIC programming. Through these containers, SONiC gathers information about components such as fans, power supplies, LEDs, optical transceivers, and other hardware elements. At the same time, they participate in the process in which higher-level control information is eventually translated into hardware state that can be programmed into the ASIC.

Among these containers, swss (Switch State Service) has a particularly central role. It acts as an intermediate layer between application logic and the state that is eventually programmed into the hardware. For example, the BGP container may publish routes to the APPL_DB database. The orchagent component in the swss container reads this information, converts it into a form suitable for ASIC programming, and publishes the result to the ASIC_DB database.

In the final stage, the syncd container reads the information published to ASIC_DB and passes it through the SAI interface to the vendor's ASIC SDK. The vendor's ASIC SDK, together with the platform-specific ASIC driver, programs the required state into the physical switch ASIC. In this way, a route learned from a BGP neighbor passes through several SONiC components and database layers before it is finally programmed into the hardware forwarding table.

It is also worth noting that not all SONiC components are containerized. Some configuration-related tools, such as the SONiC CLI and configuration generation logic, run directly on the Linux host system.

The key point is that SONiC does not rely on one large monolithic network process. Instead, it separates major functions into containers and connects them through a shared database model built on Redis. This separation makes the system easier to extend, monitor, restart, and adapt to different switch platforms.



Figure 1-1: SONiC Micro-Service Architecture Overview.

SONiC Database-Oriented Design


The Redis server runs inside the Database container. Figure 1-2 shows a single-instance Redis model used in SONiC, where a single Redis server hosts several logical databases for different purposes. Redis provides a lightweight in-memory data store with fast access and simple inter-process communication, making it well suited for SONiC's modular container-based architecture.

APPL_DB stores application-level state, such as route entries published by the BGP container. ASIC_DB stores SAI-oriented objects that represent the state to be programmed into the switch ASIC.
Although APPL_DB and ASIC_DB may contain information related to the same network function, such as routing, they do not communicate directly with each other. Instead, the swss container provides the orchestration logic that connects application-level state to hardware-oriented state. The central component in this process is orchagent, which monitors relevant APPL_DB tables, processes updates, and writes the corresponding SAI-level objects to ASIC_DB.

A simplified route update flow works as follows. When the BGP container learns, updates, or withdraws a route, its synchronization logic publishes the route update to APPL_DB. orchagent receives the update from APPL_DB, translates the route information into ASIC-programming intent, and writes the result to ASIC_DB. In this role, orchagent acts both as a consumer of application-level updates and as a producer of hardware-oriented database entries.

After the route-related objects have been written to ASIC_DB, syncd consumes the ASIC_DB updates and passes them through the SAI interface to the vendor ASIC SDK. The vendor ASIC SDK, together with the platform-specific driver stack, then programs the required forwarding state into the physical switch ASIC. This example illustrates the producer-consumer pattern used throughout SONiC's database-oriented architecture. 

In practice, database-oriented means that SONiC services exchange much of their configuration, operational state, and event information through Redis databases rather than through direct service-to-service calls.

Other key SONiC databases include CONFIG_DB, STATE_DB, and COUNTERS_DB. CONFIG_DB stores the switch configuration. During startup, it is populated from JSON-formatted configuration data, while subsequent updates may originate from SONiC management interfaces. STATE_DB stores operational state reported by SONiC processes, while COUNTERS_DB stores counters and telemetry-related statistics. These databases are discussed in more detail in later sections.
The relationship between these logical databases in the single-instance Redis model is illustrated in Figure 1-2.

Figure 1-2: Single-Instance Redis Database Model in SONiC.

In the single-instance model, different containers publish and consume data through logical databases that share the same Redis process, CPU resources, and UNIX socket. Under heavy load, such as during large route update bursts, this shared model can create contention and delay database operations for other functions. To reduce this risk in larger or more demanding deployments, SONiC also supports a multi-instance Redis architecture.

In the multi-instance model, databases can be organized according to workload characteristics such as read/write frequency and CPU utilization. Figure 1-3 illustrates one possible grouping of Redis instances based on workload characteristics. APPL_DB and ASIC_DB are shown in a High-Churn Processing Instance, CONFIG_DB and STATE_DB in a Management and Slow-State Instance, and COUNTERS_DB in a High-Frequency Telemetry Instance. This separation allows each Redis instance to expose its own UNIX socket and also allows deployments to apply CPU-affinity policies when needed. As a result, high-frequency telemetry collection or large route-update bursts are less likely to interfere with slower management and state operations.


Figure 1-3: Multi-Instance Redis Database Model in SONiC.

SONiC's architecture is built around modular services, shared databases, and a hardware abstraction layer that separates common network functions from platform-specific implementation details. Understanding these fundamentals makes it easier to follow later topics such as installation, startup, hardware discovery, configuration handling, and operational troubleshooting.



Monday, 25 May 2026

SONIC Part III: SONiC Introduction

SONiC is a vendor-neutral, Linux-based network operating system (NOS) that uses a database-driven architecture. Its software components run in multiple containers and exchange information through Redis. In SONiC, several named databases are defined for different functions, and these databases are mapped to Redis logical database IDs. Through this design, configuration data, application state, operational state, and ASIC-related state move between software layers by means of specialized processes.

Different hardware vendors may add their own platform integrations, transceiver support, monitoring utilities, or management workflows. However, the core SONiC architecture remains the same. This is one of the main reasons why SONiC knowledge, troubleshooting methods, and automation practices are transferable across different hardware platforms.

Vendor neutrality does not mean that every SONiC-based implementation behaves exactly the same in every operational detail. It means that different implementations follow the same architectural model. To organize information clearly, SONiC defines several named databases, each of which is mapped to a Redis logical database ID:

·       CONFIG_DB (Redis DB 4): Stores the user’s intended configuration.

·       APPL_DB (Redis DB 0): Stores application-level objects that are ready for processing by lower software layers.

·       STATE_DB (Redis DB 6): Stores operational state information about system components.

·       ASIC_DB (Redis DB 1): Stores objects in a form used by the SONiC and SAI pipeline for hardware programming.

Figure 1-01 shows the relationship between Redis logical databases and SONiC databases from a routing-oriented point of view. A standard Redis instance commonly provides sixteen logical databases by default, and SONiC uses a defined subset of them for its core functions.

As shown in the routing example in Figure 1-01, APPL_DB contains a native Redis Set called ROUTE_TABLE_KEY_SET. This set tracks route-related keys, but it does not store route attributes such as next-hop or metric values. The actual routes are stored as separate Redis keys that follow SONiC’s table-and-key naming convention, where the table name and the object identifier are joined with a colon. Examples include ROUTE_TABLE:192.168.1.1/32 for a host route and ROUTE_TABLE:10.1.1.0/30 for a network route. Route attributes are stored as field-value pairs in a Redis Hash, which SONiC uses to represent structured objects in its databases.

The following chapters build on this foundation. First, we examine what happens when an interface changes from down to up and receives an IP address configuration. Next, we trace the internal processes that begin when a BGP session is established and the system starts handling BGP UPDATE messages. By moving from interface bring-up to control-plane route learning, you will see how configuration data and protocol state pass through the software layers until the resulting forwarding information is programmed into the switch hardware.

Figure 1-01 also includes config_db.json to indicate that persistent configuration is stored outside Redis and is loaded into CONFIG_DB during startup, while the detailed workflow is covered in later sections.


Figure 1-01: Relationship Between Redis Logical Databases and SONiC Databases.

Monday, 4 May 2026

SONiC Part II: Deploy a SONiC Switch Clos Topology

 

Introduction

 

This chapter explains how to create and deploy a simple SONiC-based Clos topology in WSL using Containerlab. First, we open VS Code from WSL to create and edit a topology definition file. Next, we build the topology by defining nodes (SONiC switches and Linux hosts) and the links between them. Before deploying the lab, we verify the wiring with Containerlab’s built-in topology graph. Finally, we deploy the topology and validate access to the nodes using both a Linux shell and the SONiC CLI (vtysh).

Phase 1: Integrate VS Code with WSL




There are a couple of ways to use VS Code with WSL. In this lab, we launch VS Code from the WSL terminal using code .. The first time you run this command, VS Code installs the VS Code Server components inside WSL and then opens a VS Code window connected to the Linux environment. After the installation completes, running code . from any directory opens that folder directly in VS Code.

nwkt@Toni:~$ code .

Updating VS Code Server to version 034f571df509819cc10b0c8129f66ef77a542f0e

Removing previous installation...

Installing VS Code Server for Linux x64 (034f571df509819cc10b0c8129f66ef77a542f0e)

Downloading: 100%

Unpacking: 100%

Unpacked 3505 files and folders to /home/nwkt/.vscode-server/bin/034f571df509819cc10b0c8129f66ef77a542f0e.

Looking for compatibility check script at /home/nwkt/.vscode-server/bin/034f571df509819cc10b0c8129f66ef77a542f0e/bin/helpers/check-requirements.sh

Running compatibility check script

Compatibility check successful (0)

nwkt@Toni:~$

Example 2-3: Open VS Code from WSL (install VS Code Server on first run).

 

Phase 2: Create Topology File

It is a good practice to create a consistent folder structure for your lab projects. Example 2-1 shows a simple directory layout using the tree command. If tree is not installed, you can add it with sudo apt install tree.

 

nwkt@Toni:~$ tree

.

├── clos-lab

   ├── host-config

   └── switch-config

└── snap

 

5 directories, 0 files

 

Example 2-1: Project folder structure.

After creating the folder structure, run code . from the clos-lab directory to open VS Code in the correct working folder. In VS Code, create a new file and name it lab-1.clab.yml (or another name ending in .clab.yml). Because VS Code was opened from the correct folder, the file is saved directly under clos-lab.


Figure 2-1: VS Code: open a new file.

Next, use the Ctrl+K, M keyboard shortcut to open the language mode selection drop-down menu and select YAML.


Figure 2-2: VS Code: select language mode.

 

A Containerlab topology file defines the nodes to start (and their container images) and how those nodes are connected with links. The file begins with a lab name, for example name: nwkt-01. Containerlab uses this value as part of the container naming convention. For example, the node spine-1 is created as clab-nwkt-01-spine-1.

Under the topology: key, the nodes: section defines each node. In this chapter we use kind: sonic-vs with image: docker-sonic-vs:latest for the SONiC switches, and kind: linux with image: alpine:latest for the hosts. A node’s kind tells Containerlab how to boot the node and what features it supports. It also affects how interface names are interpreted for link endpoints.

When using kind: sonic-vs, Containerlab connects the container’s management interface to its management network on eth0. Data-plane interfaces start at eth1 and map to SONiC front-panel ports. For example, in a sonic-vs container eth1 maps to Ethernet0 and eth2 maps to Ethernet4. This is why the links in Example 2-2 use Linux-style names such as spine-1:eth1 and leaf-1:eth1.

The links: section describes how nodes are wired together. Each link has two endpoints. For example, endpoints: ["spine-1:eth1", "leaf-1:eth1"] creates a point-to-point link between spine-1 and leaf-1 using their first data-plane interfaces.

 

name: nwkt-01

topology:

  nodes:

    spine-1:

      kind: sonic-vs

      image: docker-sonic-vs:latest

    leaf-1:

      kind: sonic-vs

      image: docker-sonic-vs:latest

    leaf-2:

      kind: sonic-vs

      image: docker-sonic-vs:latest

    host-1:

      kind: linux

      image: alpine:latest

    host-2:

      kind: linux

      image: alpine:latest

 

  links:

    # Connections for Leaf-1

    - endpoints: ["spine-1:eth1", "leaf-1:eth1"]

    - endpoints: ["leaf-1:eth2", "host-1:eth1"]

    # Connections for Leaf-2

    - endpoints: ["spine-1:eth2", "leaf-2:eth1"]

    - endpoints: ["leaf-2:eth2", "host-2:eth1"]

Example 2-2: Containerlab topology file: lab-1.clab.yml.


Containerlab topology files typically use the .clab.yml or .clab.yaml extension. When you run containerlab deploy without specifying a topology file, Containerlab looks for a single .clab.yml or .clab.yaml file in the current directory. If multiple matching files exist, use -t to select the desired file (for example, containerlab deploy -t lab-1.clab.yml). Using the .yml extension is common, but .yaml works as well.

Create the topology file as shown in Example 2-2. VS Code provides indentation guides and syntax highlighting for YAML, which makes the file easier to read and helps you avoid indentation errors. Save the file in the clos-lab folder.


Figure 2-3: VS Code YAML editing with indentation and syntax highlighting.

 

 

nwkt@Toni:~$ tree

.

├── clos-lab

   ├── host-config

   ├── lab-1.clab.yml

   └── switch-config

└── snap

 

5 directories, 1 file

 

Example 2-4: Folder and file structure.

 

Phase 3: Verify Wiring



Before deploying the topology, it is a good idea to verify that the wiring is correct. Containerlab includes a built-in visualization tool that generates a graphical representation of the topology. The command sudo containerlab graph -t lab-1.clab.yml starts a small local web server (by default on port 50080) and prints one or more URLs you can open in a browser. This is a useful sanity check before deployment, for example, to confirm that spine-1 is connected to the correct interface on leaf-1.

 

nwkt@Toni:~/clos-lab$ sudo containerlab graph -t lab-1.clab.yml

13:57:22 INFO Parsing & checking topology file=lab-1.clab.yml

13:57:22 INFO Serving topology graph

  addresses=

  │   http://10.255.255.254:50080

  │   http://172.25.109.88:50080

  │   http://172.17.0.1:50080

  │   http://172.20.20.1:50080

  │   http://[3fff:172:20:20::1]:50080

Example 2-5: Generate a graphical topology view.

Figure 2-4: Graphical topology view (URL http://172.25.109.88:50080 ).

 

Phase 4: Deploy Topology File



After saving lab-1.clab.yml, deploy the lab with sudo containerlab deploy (or explicitly specify the file with -t lab-1.clab.yml). Containerlab parses the topology file, creates a lab directory (clab-<lab-name>), starts the containers, and connects them with the defined links. In the summary table, the Name column shows the full container names (used with docker commands), and the IPv4/6 Address column shows the management IP addresses assigned on the Containerlab management network.

 

nwkt@Toni:~/clos-lab$ sudo containerlab deploy

11:54:41 INFO Containerlab started version=0.74.3

11:54:41 INFO Parsing & checking topology file=lab-1.clab.yml

11:54:41 INFO Creating lab directory path=/home/nwkt/clos-lab/clab-nwkt-01

11:54:41 INFO Creating container name=host-1

11:54:41 INFO Creating container name=host-2

11:54:41 INFO Creating container name=leaf-1

11:54:41 INFO Creating container name=leaf-2

11:54:41 INFO Creating container name=spine-1

11:54:42 INFO Created link: spine-1:eth1 ▪┄┄ leaf-1:eth1

11:54:42 INFO Created link: leaf-1:eth2 ▪┄┄ host-1:eth1

11:54:43 INFO Created link: spine-1:eth2 ▪┄┄ leaf-2:eth1

11:54:43 INFO Created link: leaf-2:eth2 ▪┄┄ host-2:eth1

11:54:43 INFO Adding host entries path=/etc/hosts

11:54:43 INFO Adding SSH config for nodes path=/etc/ssh/ssh_config.d/clab-nwkt-01.conf

11:54:43 INFO containerlab version

  🎉=

  │ A newer containerlab version (0.75.0) is available!

  │ Release notes: https://containerlab.dCustomerev/rn/0.75/

  │ Run 'clab version upgrade' or see https://containerlab.dev/install/ for other installation options.

──────────────────────────────────────────────────────────────────────────

│         Name         │       Kind/Image       │  State  │   IPv4/6 Address  │

──────────────────────────────────────────────────────────────────────────

│ clab-nwkt-01-host-1  │ linux                  │ running │ 172.20.20.2       │

│                      │ alpine:latest          │         │ 3fff:172:20:20::2 │

──────────────────────────────────────────────────────────────────────────

│ clab-nwkt-01-host-2  │ linux                  │ running │ 172.20.20.6       │

│                      │ alpine:latest          │         │ 3fff:172:20:20::6 │

──────────────────────────────────────────────────────────────────────────

│ clab-nwkt-01-leaf-1  │ sonic-vs               │ running │ 172.20.20.5       │

│                      │ docker-sonic-vs:latest │         │ 3fff:172:20:20::5 │

──────────────────────────────────────────────────────────────────────────

│ clab-nwkt-01-leaf-2  │ sonic-vs               │ running │ 172.20.20.4       │

│                      │ docker-sonic-vs:latest │         │ 3fff:172:20:20::4 │

──────────────────────────────────────────────────────────────────────────

│ clab-nwkt-01-spine-1 │ sonic-vs               │ running │ 172.20.20.3       │

│                      │ docker-sonic-vs:latest │         │ 3fff:172:20:20::3 │

──────────────────────────────────────────────────────────────────────────

nwkt@Toni:~/clos-lab$

Example 2-6: Topology deployment output.


 After deploying the topology, you can use tree to review the lab directory and related files created during the deployment.


nwkt@Toni:~$ tree

.

├── clos-lab

│   ├── clab-nwkt-01

│   │   ├── ansible-inventory.yml

│   │   ├── authorized_keys

│   │   ├── leaf-1

│   │   ├── leaf-2

│   │   ├── nornir-simple-inventory.yml

│   │   ├── spine-1

│   │   └── topology-data.json

│   ├── host-config

│   ├── lab-1.clab.yml

│   └── switch-config

└── snap

 

9 directories, 5 files

Example 2-7: Updated folder structure after deployment.


Example 2-8 shows how to verify the status of the containers using docker ps. The --format option prints a readable table with the container ID, name, and status.


nwkt@Toni:~$ docker ps -a --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"

CONTAINER ID   NAMES                  STATUS

c67cbb5fe8e8   clab-nwkt-01-host-1    Up 36 minutes

1696b2865f8e   clab-nwkt-01-host-2    Up 36 minutes

b7517c417137   clab-nwkt-01-leaf-2    Up 36 minutes

810267f0cf2b   clab-nwkt-01-leaf-1    Up 36 minutes

60c37f941005   clab-nwkt-01-spine-1   Up 36 minutes

0c01df3ef211   adoring_brattain       Exited (0) 6 days ago

Example 2-8: List containers and verify status.

 

Phase 5: Test Connection – Log In to Nodes



As a final step, verify that you can access the nodes. To open a Linux shell inside a node container, run docker exec -it clab-nwkt-01-leaf-1 bash. From the shell, start the SONiC CLI by running vtysh. You can also start the CLI directly with docker exec -it clab-nwkt-01-leaf-1 vtysh.

 

nwkt@Toni:~$ docker exec -it clab-nwkt-01-leaf-1 bash

root@leaf-1:/#

root@leaf-1:/#

root@leaf-1:/# vtysh

 

Hello, this is FRRouting (version 10.0.1).

Copyright 1996-2005 Kunihiro Ishiguro, et al.

   <snipped for brevity>

leaf-1#

leaf-1#

leaf-1# sh run

Building configuration...

 

Current configuration:

!

frr version 10.0.1

frr defaults traditional

hostname leaf-1

domainname localdomain

no ipv6 forwarding

no zebra nexthop kernel enable

fpm address 127.0.0.1

no fpm use-next-hop-groups

service integrated-vtysh-config

!

ip nht resolve-via-default

!

ipv6 nht resolve-via-default

!

end

leaf-1#

Example 2-9: Log in to a node and open the SONiC CLI.