The Network Times: Ultra Etherent: Discovery

Updated 8-Ocotber 2025

Creating the fi_info Structure

Before the application can discover what communication services are available, it first needs a way to describe what it is looking for. This description is built using a structure called fi_info. The fi_info structure acts like a container that holds the application’s initial requirements, such as desired endpoint type or capabilities.

The first step is to reserve memory for this structure in the system’s main memory. The fi_allocinfo() helper function does this for the application. When called, fi_allocinfo() allocates space for a new fi_info structure, which this book refers to as the pre-fi_info, that will later be passed to the libfabric core for matching against available providers.

At this stage, most of the fields inside the pre-fi_info structure are left at their default values. The application typically sets only the most relevant parameters that express what it needs, such as the desired endpoint type or provider name, and leaves the rest for the provider to fill in later.

In addition to the main fi_info structure, the helper function also allocates memory for a set of sub-structures. These describe different parts of the communication stack, including the fabric, domain, and endpoints. The fixed sub-structures include:

fi_fabric_attr: Describes the fabric-level properties such as the provider’s name and fabric identifier.
fi_domain_attr: Defines attributes related to the domain, including threading and data progress behavior.
fi_ep_attr: Specifies endpoint-related characteristics such as endpoint type.
fi_tx_attr and fi_rx_attr: Contain parameters for transmit and receive operations, such as message ordering and completion behavior.
fi_nic: Represents the properties and capabilities of the UET network interface.

In figure 4-1, these are labeled as fixed sub-structures because their layout and meaning are always the same. They consist of predefined fields and expected value types, which makes them consistent across different applications. Like the main fi_info structure, they usually remain at their default values until the provider fills them in. The information stored in these sub-structures will later be leveraged when the application begins creating actual fabric objects, such as domains and endpoints.

In addition to the fixed parts, the fi_info structure can contain generic sub-structures such as src_addr. Unlike the fixed ones, the generic sub-structure src_addr depend on the chosen addr_format. For example, when using Ultra Ethernet Transport, the address field points to a structure describing a UET endpoint address, which includes bits for Version, Flags, Fabric Endpoint Capabilities, PIDonFEB, Fabric Address, Start Resource Index, Num Resource Indices, and Initiator ID. This compact representation carries both addressing and capability information, allowing the same structure definition to be reused across different transport technologies and addressing schemes. Note that in figure 4-2 the returned src_addr is only partially filled because the complete address information is not available until the endpoint is created.

In Figure 4-1, the application defines its communication requirements in the fi_info structure by setting the caps (capabilities) field. This field describes the types of operations the application intends to perform through the fabric interface. For example, values such as FI_MSG, FI_RMA, FI_WRITE, FI_REMOTE_WRITE, FI_COLLECTIVE, FI_ATOMIC, and FI_HMEM specify support for message-based communication, remote memory access, atomic operations, and host memory extensions.

When the fi_getinfo() call is issued, the provider compares these requested capabilities against what the underlying hardware and driver can support. Only compatible providers return a matching fi_info structure.

In this example, the application also sets the addr_format field to FI_ADDR_UET, indicating that Ultra Ethernet Transport endpoint addressing is used. This format includes hardware-specific addressing details beyond a simple IP address.

The current Ultra Ethernet Transport specification v1.0 does not define or support the FI_COLLECTIVE capability. Therefore, the UET provider does not return this flag, and collective operations are not offloaded or accelerated by the UET NIC.

After fi_allocinfo() has allocated memory for both the fixed and generic sub-structures, it automatically links them together by inserting pointers into the main fi_info structure. The application can then easily access each attribute through fi_info without manually handling the memory layout.

Once the structure is prepared, the next step is to request matching provider information using fi_getinfo() API call, which will be described in detail in the following section.

Figure 4-1: Discovery: Allocate Memory and Create Structures – fi_allocinfo.

Requesting Provider Services with fi_getinfo()

After creating a pre-fi_info structure, the application calls the fi_getinfo() API to discover which services and transport features are available on the node’s NIC(s). This function takes a pointer to the pre-fi_info structure, which contains hints describing the application’s requirements, such as desired capabilities and address format.

When the discovery request reaches the libfabric core, the library identifies and loads an appropriate provider from the available options, which may include providers for TCP, verbs (InfiniBand), or sockets (TCP/UDP). For Ultra Ethernet Transport, the core selects the uet-provider. The core invokes the provider’s entry points, including the .getinfo callback, which is responsible for returning the provider’s supported capabilities. Internally, the provider uses function pointers uet_getinfo.

Inside the UET provider, the uet_getinfo() routine queries the NIC driver or kernel interface to determine what capabilities each NIC can safely and efficiently support. The provider does not access the hardware directly. For multi-GPU AI workloads, the focus is on push-based remote memory access operations:

FI_MSG: Used for standard message-based communication.
FI_RMA: Enables direct remote memory access, forming the foundation for high-performance gradient or parameter transfers between GPUs.
FI_WRITE: Allows a GPU to push local gradient updates directly into another GPU’s memory.
FI_REMOTE_WRITE: Signals that remote GPUs can write directly into this GPU’s memory, supporting push-based collective operations.
FI_COLLECTIVE: Indicates support for collective operations like AllReduce, though the current UET specification does not implement this capability.
FI_ATOMIC: Allows atomic operations on remote memory.
FI_HMEM: Marks support for host memory or GPU memory extensions.

Low-level hardware metrics, such as link speed or MTU, are not returned at this phase; the focus is on semantic capabilities that the application can rely on.

The provider allocates new fi_info structures in CPU memory, creating one structure per NIC that satisfies the hints provided by the application and describes all other supported services.

After the provider has created these structures, libfabric returns them to the application as a linked list. The next pointer links all available fi_info structures, allowing the application to iterate over the discovered NICs. Each fi_info entry contains both top-level fields, such as caps and addr_format, and several attribute sub-structures—fi_fabric_attr, fi_domain_attr, fi_ep_attr, fi_tx_attr, and fi_rx_attr.

Even if the application provides no hints, the provider fills in these attribute groups with its default or supported values. This ensures that every fi_info structure returned by fi_getinfo() contains a complete description of the provider’s capabilities and configuration options. During object creation, the libfabric core passes these attributes to the provider, which uses them to map the requested fabric, domain, and endpoint objects to the appropriate NIC and transport configuration.

The application can then select the most appropriate entry and request the creation of the Fabric object. Creating the Fabric first establishes the context in which all subsequent domains, sub-resources, and endpoints are organized and managed. Once all pieces of the AI Fabric “jigsaw puzzle” abstraction have been created and initialized, the application can release the memory for all fi_info structures in the linked list by calling fi_freeinfo().

Figure 4-2: Discovery: Discover Provider Capabilities – fi_getinfo.

Note: fields and values of fi_info and its sub-structures are explained in upcoming chapters

Next, Fabric Object...

Tuesday, 7 October 2025

Ultra Etherent: Discovery

Creating the fi_info Structure

Requesting Provider Services with fi_getinfo()

No comments:

Post a Comment