Introduction
Ultra Ethernet uses the libfabric communication framework to let endpoints interact with AI frameworks and, ultimately, with each other across GPUs. Libfabric provides a high-performance, low-latency API that hides the details of the underlying transport, so AI frameworks do not need to manage the low-level details of endpoints, buffers, or the underlying address tables that map communication paths. This makes applications more portable across different fabrics while still providing access to advanced features such as zero-copy transfers and RDMA, which are essential for large-scale AI workloads.
During system initialization, libfabric coordinates with the appropriate provider—such as the UET provider—to query the network hardware and organize communication around three main objects: the Fabric, the Domain, and the Endpoint. Each object manages specific sub-objects and resources. For example, a Domain handles memory registration and hardware resources, while an Endpoint is associated with completion queues, transmit/receive buffers, and transport metadata. Ultra Ethernet maps these objects directly to the network hardware, ensuring that when GPUs begin exchanging training data, the communication paths are already aligned for low-latency, high-bandwidth transfers.
Once initialization is complete, AI frameworks issue standard libfabric calls to send and receive data. Ultra Ethernet ensures that this data flows efficiently across GPUs and servers. By separating initialization from runtime communication, this approach reduces overhead, minimizes bottlenecks, and enables scalable training of modern AI models.
Ultra Ethernet Inter-GPU Capability Discovery Flow
Phase 1: Application Requests Transport Capabilities
The process begins when an AI application specifies its needs for Ultra Ethernet RMA transport between GPUs, including the type of communication, completion method, and other preferences. The application calls the libfabric function fi_getinfo, providing these hints. Libfabric receives the request and prepares to identify network interfaces and capabilities that match the application’s requirements. This information will eventually be returned in a fi_info structure, which describes the available interfaces and their features.
Phase 2: Libfabric Core Loads the Provider
The libfabric core determines which provider can fulfill the request. In this case, it selects the UET provider for Ultra Ethernet RMA transport. Libfabric can support multiple providers for different network types, such as TCP, InfiniBand (verbs), or other specialized fabrics. The core loads the chosen provider and calls its getinfo function through the registration structure, which references the provider’s main operations: fabric creation, domain creation, endpoint creation, and network information retrieval. This allows libfabric to interact with the provider without knowing its internal implementation.
Phase 3-4: Provider Queries the NIC
Inside the provider, the registration structure directs libfabric to the correct function implementation (getinfo). This function queries each network interface on the host. The hardware driver responds with detailed information for each interface, including MTU, link speed, address formats, memory registration support, and supported transport modes like RMA or messaging. At this stage, the provider has a complete picture of the hardware, but the information has not yet been organized into fi_info structures.
Phase 5: Provider Fills fi_info Structures and Libfabric Filters Results
The provider fills the fi_info structures with the discovered NIC capabilities. The list is then returned to the libfabric core, which applies the application’s original hints to filter the results. Only the interfaces and transport options that match the requested criteria are presented to the application, providing a pre-filtered set of network options ready for use.
Phase 6: Application Receives Filtered Capabilities
The filtered fi_info structures are returned to the application. The AI framework can now create fabrics, domains, and endpoints, confident that the selected NICs are ready for efficient inter-GPU communication. By separating initialization from runtime communication, Ultra Ethernet and libfabric ensure that resources are aligned with the hardware, minimizing overhead and enabling predictable, high-bandwidth transfers.
The flow—from the application request, through libfabric core, the UET provider, the NIC and hardware driver, and back to the application—establishes a clear separation of responsibilities: the application defines what it needs, libfabric coordinates the providers, the provider interacts with the hardware, and the application receives a pre-filtered set of options ready for low-latency, high-bandwidth inter-GPU communication.
Figure 4-1: Initialization stage – Discovery of Provider capabilities.
Simplified Coding Examples (Optional)
This book does not aim to teach programming, but simplified code examples are provided here to support readers who learn best by studying practical snippets. These examples map directly to the six phases of initialization and show how libfabric and the UET provider work together under the hood.
Example 1 – Application fi_info Request with Hints
Why: The application needs to describe what kind of transport it wants (e.g., RMA for GPU communication) without worrying about low-level NIC details.
What: It prepares a fi_info structure with hints, such as transport type and completion method, then calls fi_getinfo() to ask libfabric which providers and NICs can match these requirements.
How: The application sets fields in hints, then hands them to fi_getinfo(). Libfabric uses this information to begin provider discovery.
struct fi_info *hints, *info;
hints = fi_allocinfo();
// Request a reliable datagram endpoint for inter-GPU RMA
hints->ep_attr->type = FI_EP_RDM;
// Request RMA and messaging capabilities
hints->caps = FI_RMA | FI_MSG;
// Ask for completion notifications per operation (context-based)
hints->mode = FI_CONTEXT;
// Memory registration preferences
hints->domain_attr->mr_mode = FI_MR_LOCAL | FI_MR_VIRT_ADDR;
// Query libfabric to get matching providers and NICs
int ret = fi_getinfo(FI_VERSION(1, 18), NULL, NULL, 0, hints, &info);
Example 2 – UET Provider Registration with getinfo
Why: Providers tell libfabric what operations they support. Registration connects the generic libfabric core to the UET-specific implementation.
What: The fi_provider structure is filled with function pointers. Among them, uet_getinfo is the callback used when libfabric queries the UET provider for NIC capabilities.
How: Libfabric calls these functions through the registration, so it doesn’t need to know the provider’s internal code.
struct fi_provider uet_prov = {
.name = "uet",
.version = FI_VERSION(1, 18),
.getinfo = uet_getinfo, // provider-specific implementation
.fabric = uet_fabric,
.domain = uet_domain,
.endpoint= uet_endpoint,
};
Example 3 – Provider Builds fi_info Structures
Why: After querying the NIC driver, the provider must describe all capabilities in a format libfabric understands. This includes link speed, MTU, memory registration, and supported transport modes. These details allow libfabric to determine which providers can satisfy the application’s requirements.
What: The provider allocates and fills an fi_info structure with the capabilities of each discovered NIC. This structure represents the full picture of the hardware, independent of what the application specifically requested.
How: The uet_getinfo() function queries each NIC, populates fi_info fields, and returns them to libfabric for further filtering.
int uet_getinfo(uint32_t version, const char *node, const char *service,
uint64_t flags, struct fi_info *hints, struct fi_info **info) {
struct fi_info *fi;
fi = fi_allocinfo();
// Provider and endpoint information
fi->fabric_attr->prov_name = strdup("uet");
fi->fabric_attr->name = strdup("UltraEthernetFabric");
fi->ep_attr->type = FI_EP_RDM;
fi->caps = FI_RMA | FI_MSG;
fi->domain_attr->mr_mode = FI_MR_LOCAL | FI_MR_VIRT_ADDR;
// Illustrative NIC attributes (not mandatory for the example)
fi->domain_attr->mtu = 9000; // max transmission unit
fi->fabric_attr->link_speed = 20000; // 20 Gbps
fi->domain_attr->address_format = FI_SOCKADDR_IN; // IPv4 example
*info = fi;
return 0;
}
Note: The provider describes all capabilities, even those not requested, so libfabric can filter later. Fields like MTU, link speed, and address format are illustrative, showing the kind of hardware information included.
Example 4 – Filtered fi_info Returned to Application
Why: The application does not need all hardware details — only the NICs and transport modes that match its original hints. Filtering ensures it sees a concise, relevant set of options.
What: Libfabric applies the application’s criteria to the provider’s fi_info structures and returns only the matching entries.
How: The application receives the filtered list and can use it to initialize fabrics, domains, and endpoints for communication.
struct fi_info *p;
for (p = info; p; p = p->next) {
// Display key information relevant to the application
printf("Provider: %s, Endpoint type: %d, Caps: %llu, MTU: %d\n",
p->fabric_attr->prov_name,
p->ep_attr->type,
(unsigned long long)p->caps,
p->domain_attr->mtu);
}
fi_freeinfo(info);
Note: Only the relevant subset of NIC capabilities is shown to the application. The application can now create fabrics, domains, and endpoints confident that the hardware matches its requirements. Filtering reduces complexity and ensures predictable, high-performance inter-GPU transfers.
References
[1] Libfabric Programmer's Manual: Libfabric man pages https://ofiwg.github.io/libfabric/v2.3.0/man/
[2] Ultra Ethernet Specification v1.0, June 11, 2025 by Ultra Ethernet Consortium, https://ultraethernet.org
No comments:
Post a Comment