Creating the fi_info Structure
Before the application can discover what communication services are available, it first needs a way to describe what it is looking for. This description is built using a structure called fi_info. The fi_info structure acts like a container that holds the application’s initial requirements, such as desired endpoint type or capabilities.
The first step is to reserve memory for this structure in the system’s main memory. The fi_allocinfo() helper function does this for the application. When called, fi_allocinfo() allocates space for a new fi_info structure, which this book refers to as the pre-fi_info, that will later be passed to the libfabric core for matching against available providers.
At this stage, most of the fields inside the pre-fi_info structure are left at their default values. The application typically sets only the most relevant parameters that express what it needs, such as the desired endpoint type or provider name, and leaves the rest for the provider to fill in later.
In addition to the main fi_info structure, the helper function also allocates memory for a set of sub-structures. These describe different parts of the communication stack, including the fabric, domain, and endpoints. The fixed sub-structures include:
- fi_fabric_attr: Describes the fabric-level properties such as the provider’s name and fabric identifier.
- fi_domain_attr: Defines attributes related to the domain, including threading and data progress behavior.
- fi_ep_attr: Specifies endpoint-related characteristics such as endpoint type.
- fi_tx_attr and fi_rx_attr: Contain parameters for transmit and receive operations, such as message ordering and completion behavior.
- fi_nic: Represents the properties and capabilities of the UET network interface.
In figure 4-1, these are labeled as fixed sub-structures because their layout and meaning are always the same. They consist of predefined fields and expected value types, which makes them consistent across different applications. Like the main fi_info structure, they usually remain at their default values until the provider fills them in. The information stored in these sub-structures will later be leveraged when the application begins creating actual fabric objects, such as domains and endpoints.
In addition to the fixed parts, the fi_info structure can contain generic sub-structures such as src_addr. Unlike the fixed ones, the generic sub-structure src_addr depend on the chosen addr_format. For example, when using Ultra Ethernet Transport, the address field points to a structure describing a UET endpoint address, which includes bits for Version, Flags, Fabric Endpoint Capabilities, PIDonFEB, Fabric Address, Start Resource Index, Num Resource Indices, and Initiator ID. This compact representation carries both addressing and capability information, allowing the same structure definition to be reused across different transport technologies and addressing schemes. Note that in figure 4-2 the returned src_addr is only partially filled because the complete address information is not available until the endpoint is created.
In Figure 4-1, the application defines its communication requirements in the fi_info structure by setting the caps (capabilities) field. This field describes the types of operations the application intends to perform through the fabric interface. For example, values such as FI_MSG, FI_RMA, FI_WRITE, FI_REMOTE_WRITE, FI_COLLECTIVE, FI_ATOMIC, and FI_HMEM specify support for message-based communication, remote memory access, atomic operations, and host memory extensions.
When the fi_getinfo() call is issued, the provider compares these requested capabilities against what the underlying hardware and driver can support. Only compatible providers return a matching fi_info structure.
In this example, the application also sets the addr_format field to FI_ADDR_UET, indicating that Ultra Ethernet Transport endpoint addressing is used. This format includes hardware-specific addressing details beyond a simple IP address.
The current Ultra Ethernet Transport specification v1.0 does not define or support the FI_COLLECTIVE capability. Therefore, the UET provider does not return this flag, and collective operations are not offloaded or accelerated by the UET NIC.
After fi_allocinfo() has allocated memory for both the fixed and generic sub-structures, it automatically links them together by inserting pointers into the main fi_info structure. The application can then easily access each attribute through fi_info without manually handling the memory layout.
Once the structure is prepared, the next step is to request matching provider information using fi_getinfo() API call, which will be described in detail in the following section.
Requesting Provider Services with fi_getinfo()
- FI_MSG: Used for standard message-based communication.
- FI_RMA: Enables direct remote memory access, forming the foundation for high-performance gradient or parameter transfers between GPUs.
- FI_WRITE: Allows a GPU to push local gradient updates directly into another GPU’s memory.
- FI_REMOTE_WRITE: Signals that remote GPUs can write directly into this GPU’s memory, supporting push-based collective operations.
- FI_COLLECTIVE: Indicates support for collective operations like AllReduce, though the current UET specification does not implement this capability.
- FI_ATOMIC: Allows atomic operations on remote memory.
- FI_HMEM: Marks support for host memory or GPU memory extensions.
No comments:
Post a Comment