Sunday, 28 September 2025

Ultra Ethernet: Domain Creation Process in Libfabric

Creating a domain object is the step where the application establishes a logical context for a NIC within a fabric, enabling endpoints, completion queues, and memory regions to be created and managed consistently.

Phase 1: Application (Discovery & choice — selecting a domain snapshot)

During discovery, the provider had populated one or more fi_info entries — each entry was a snapshot describing one possible NIC/port/transport combination. Each fi_info contained nested attribute structures for fabric, domain, and endpoint: fi_fabric_attr, fi_domain_attr, and fi_ep_attr. The fi_domain_attr substructure captured the domain-level template the provider had reported during discovery (memory registration modes, MR key sizes, counts and limits, capability and mode bitmasks, CQ/CTX limits, authentication key sizes, etc.).

When the application had decided which NIC/port it wanted to use, it selected a single fi_info entry whose fi_domain_attr matched its needs. That chosen fi_info became the authoritative configuration for domain creation, containing both the application’s requested settings and the provider-reported capabilities. At this phase, the application moved forward from fabric initialization to domain creation.

To create the domain, the application called the fi_domain function:


API Call → Create Domain object

    Within Fabric ID: 0xF1DFA01

    Using fi_info structure: 0xCAFE43E

    On success: returns fid_domain handle


Phase 2: Libfabric core (dispatch & validation)

The application calls the domain creation API:

int fi_domain(struct fid_fabric *fabric, struct fi_info *info,

              struct fid_domain **domain, void *context);

What the core does, at a high level:

  • Validate arguments: Ensure fabric is a live fid_fabric handle and info is non-NULL.
  • Sanity-check provider/fabric match: The core checks that the fi_info the application supplied corresponds to the same provider (and, indirectly, the same NIC/port) represented by fabric. This is the first piece of the “glue”: the fid_fabric (published earlier) contains the provider identity and fabric name; fi_info also contains provider/fabric identifiers from discovery. The core rejects or returns an error if the two do not match (this prevents cross-provider or cross-fabric mixes).
  • Forward the call to the provider: The core hands the fi_info (including its fi_domain_attr) and the fabric handle to the provider’s domain creation entry point. The core itself remains lightweight — The core performs validation and routing only; it does not modify attributes or allocate hardware resources; the provider performs the heavy lifting of mapping attributes onto hardware.


Phase 3: UET provider (mapping to NIC / resource allocation)

The provider receives the fabric handle (so it knows which NIC/port and which provider instance to use) and the fi_info/fi_domain_attr descriptor. The provider:

  • Interprets the domain attributes and verifies they are feasible given the NIC hardware, driver state and current configuration. For example: requested MR key size, number of CQ/CTXs, per-endpoint limits, requested capability bitmask.
  • Allocates driver / NIC resources or driver contexts that correspond to a domain: memory-registration state, structures for completion queues, context objects for send/recv, and any other provider-private handles.
  • Fails early if mismatch (NIC removed, driver not support requested capability, or requested limits exceed available resources).

Because the fi_info came from discovery for that NIC port, the provider immediately knows the physical mapping. The created domain represents a logical handle for accessing the NIC (or to the NIC/port context the provider manages). In other words: the domain is the provider’s logical handle to NIC resources (memory registration tables, per-device queues, etc.). The domain represents NIC resources logically; the exact mapping to hardware structures may vary by provider implementation, but typically it corresponds one-to-one or one-to-few with real NIC ports..


Phase 4: Libfabric core (fid_domain publication & hierarchy)

On successful provider creation, the provider returns a provider-private handle (pointer) to the domain state. The libfabric core then:

  • Wraps the provider handle into an application-visible fid_domain object.
  • Links the fid_domain to its parent fid_fabric (the fabric FID is stored as the domain’s parent/owner). This is the second piece of the “glue”: the created fid_domain explicitly references the fid_fabric that it belongs to, so the core can route future child-creation calls (endpoints, MRs, CQs) for this domain back to the same provider/fabric.
  • Copies or records the domain-level attributes (caps, mode, limits) into fields of fid_domain so they can be queried, validated on child creation, and used for lifetime/ref-counting.
  • Increments ref_count on the fid_fabric to prevent fabric destruction while the domain exists.


After this step the application holds a fid_domain handle and can proceed to create endpoints, register memory, create completion queues, etc., all of which the provider maps into the NIC/driver context that the domain represents.

Example fid_domain (illustrative)


fid_domain {

    fid_type        : FI_DOMAIN

    fid             : 0xF1DD01

    parent_fid      : 0xF1DFA01

    provider        : "libfabric-uet"

    caps            : 0x00000011      

    mode            : 0x00000004

    mr_key_size     : 8

    cq_cnt          : 4

    ep_cnt          : 128

    provider_data   : <pointer to provider domain struct>

    ref_count       : 1

    context         : <app-provided void *>

}

This stored state allows the core and provider to validate, route and implement subsequent calls that reference the domain.


Explanation of fields:

Identity

  • fid_type : FI_DOMAIN: Fabric Identifier (FID). Identifies the object as a domain. The domain object allows the application (and libfabric itself) to distinguish between different object types (fabric, domain, endpoint, etc.).
  • fid: 0xF1DD001: Unique handle for this domain instance. The application uses this handle in API calls that act on the domain.
  • parent_fid: 0xF1DFA01: Reference to the parent fabric object. This links the domain to the fabric where it belongs, ensuring resources remain associated with the correct fabric.

Provider Info

  • provider: "libfabric-uet": Name of the provider managing the domain. The application can confirm which provider implementation it is using, which is important when multiple providers are installed.
  • caps: 0x00000011: Bitmask of capabilities (for example, FI_MSG | FI_RMA). Defines which communication operations the domain supports so the application can use only valid features.
  • mode: 0x00000004: Mode requirements (for example, scalable endpoints). Tells the application about specific restrictions or rules it must follow when using the domain.

Resource Linits

  • mr_key_size: 8: Size in bytes of memory registration keys. The application uses this value when registering memory regions so it provides keys of the correct length.
  • cq_cnt: 4: Maximum number of completion queues supported. Guides the application when designing its event handling because it cannot create more than this limit.
  • ep_cnt: 128: Maximum number of endpoints supported. Tells the application how many communication endpoints it can create within this domain.

Internal States

  • provider_data: <pointer to provider domain struct>: Provider-specific internal pointer. Not used directly by the application but allows the provider to maintain its own internal state.
  • ref_count : 1: Current reference count for the domain. Tracks how many objects depend on this domain and ensures proper cleanup when the domain is released.
  • context : <app-provided void *>: Application-supplied pointer. Lets the application attach custom data, such as state or identifiers, which the provider will return in callbacks.


Figure 4-4: Objects Creation Process – Domain for NIC(s).

No comments:

Post a Comment