Update March 6, 2020: This post will be obsolete soon by a new version
Forewords
This article
explains the similarities between a LISP/VXLAN based Campus Fabric and AWS
Virtual Private Cloud (VPC) from the Intra-Subnet Control-Plane and Data-Plane
operation perspective. The AWS VPC solution details are not publicly available
and the information included in this article is based on the author's own study
using publically available AWS VPC documentation.
There are two main reasons for writing this document:
First, Cisco SDA is an on-prem LAN model while the AWS VPC is an off-prem DC solution. I wanted to point out that these two solutions, even though used for very different purposes, use the same kind of Control-Plane operation and Data-Plane encapsulation and are managed via QUI. This is kind of my answer to ever going discussion about is there DC-networks, Campus-networks and so on, or is there just networks.
Second, my own curiosity to understand the operation of AWS VPC.
I usually start by introducing the example environment and then explaining the configuration, moving to Control-Plane operation and then to Data-Plane operation. However, this time I take a different approach. This article first introduces the example environment but then the Data-Plane operation is discussed before Control-Plane operation. This way it is easier to understand what information is needed and how that information is gathered.
There are two main reasons for writing this document:
First, Cisco SDA is an on-prem LAN model while the AWS VPC is an off-prem DC solution. I wanted to point out that these two solutions, even though used for very different purposes, use the same kind of Control-Plane operation and Data-Plane encapsulation and are managed via QUI. This is kind of my answer to ever going discussion about is there DC-networks, Campus-networks and so on, or is there just networks.
Second, my own curiosity to understand the operation of AWS VPC.
I usually start by introducing the example environment and then explaining the configuration, moving to Control-Plane operation and then to Data-Plane operation. However, this time I take a different approach. This article first introduces the example environment but then the Data-Plane operation is discussed before Control-Plane operation. This way it is easier to understand what information is needed and how that information is gathered.
Campus Fabric: Host Connectivity
Host-A
and Host-B are both attached to the same subnet 172.16.10.0/24 (VLAN 10).
Host-A, connected to Edge Sw-1 in building A, has an IP address 172.16.10.10/24
and Host-B connected to Edge Sw-2 in building B, has an IP address
172.16.10.20/24. The Service Instance-Id for the network 172.16.10.0/24 is
1000. Edge Sw-1 has an RLOC IP address 192.168.10.1 and Edge Sw-2 has an RLOC The IP address 192.168.10.2.
Figure 1-1: Campus Fabric - Host Connectivity.
AWS VPC: Host Connectivity
EC2
Instance A (EC2-A) and EC2 Instance B (EC2-B) are both attached to the same
subnet 172.16.10.0/24 (Availability Zone 1000). EC2-A, launched in Host-A, has
an IP address 172.16.10.10/24 attached to its Elastic Network Interface 1
(ENI-1) and EC2-B, launched in Host-B, has an IP address 172.16.10.20/24
attached to its Elastic Network Interface 2 (ENI-2). The Service Instance-Id
for the network 172.16.10.0/24 is 1000. There is a software router in both
physical hosts, Router-1 in Host-A and Router-2 in Host-B. Host-A has an IP
address 192.168.10.1 and Host-B has an IP address 192.168.10.2. Note, the AWS
VPC public documentation does not tell if there is an additional L2SW in
between the ENI and Software Router. However, that information is not so
important from this article's perspective.
Figure 1-2: AWS VPC - EC2 Instance Connectivity.
Campus Fabric: The Underlay Network and the Mapping System
Edge
Switch-1 and Edge Switch-2 are connected to IPv4 only Underlay Network. In
addition, there is a Mapping System (MS) that is responsible for (A) storing
Endpoint Identity (EID) to Remote Locator (RLOC) mapping information
(EID-to-RLOC) and (B) Publish the information when requested by edge switches. The
basic IP addressing scheme is shown in figure 1-3.
Figure 1-3: Campus Fabric – The underlay Network and the Mapping System.
AWC VPC: The Underlay Network and the Mapping Service
Software
Router-1 and Software Router-2 are connected to Underlay Network. In addition,
there is a Mapping Service (MS) that is responsible for (A) Publish the EC2
Instance location information when queried by Software Routers (original
queries are originated and needed by some other EC2 Instances). The basic IP addressing
scheme is shown in figure 1-4.
Figure 1-4: AWS VPC - The underlay Network and the Mapping Service.
As this section illustrates
the basic building blocks in both LISP-based Campus Fabric and AWS VPC are
basically the same regardless of the naming standards and the usage of either
virtual or physical hardware. The next section explains the Data-Plane
operation focusing on encapsulation.
Campus Fabric: Data-Plane Tunnel Encapsulation
Figure
1-5 illustrates the situation where Host-A in building-A wants to communicate
with Host-B in building-B. To keep things simple, the MAC/IP address resolution
process as well as EID-to-RLOC Registration/Request processes are left out from
the figure. These processes are explained later in the Control-Plane section.
When
Edge Sw-1 receives the ICMP-Request message originated by Host-A with the
destination IP address of Host-B, it makes the destination MAC-address lookup
from the MAC-Address table. Because the lookup result is negative, it checks if
the destination MAC-address is installed into LISP Mapping Cache. As said
earlier, the EID-to-RLOC Mapping/Request processes are done so there is a
mapping entry in the LISP Mapping-Cache of Edge Sw-1. Based on the information
found in the LISP Mapping-Cache, Edge Sw-1 wraps the original ICMP packet
inside tunnel headers. The Outer MAC address header includes its own MAC
address and the destination MAC address is the next-hop-router MAC address
(Router-3 on the Underlay Network). The outer source IP address is its own RLOC
IP address and the destination IP address is the RLOC IP address of Edge Sw-2.
The transport layer protocol is UDP with a randomly selected source port and
with the destination port 4789 (reserved for VXLAN). The VN-Id, which
describest the tenant, is set to 1000. Though not explained earlier, Host-A is
identified by using Scalable Group Tag (SGT), and the access policy is based on
the SGT instead of the Host-A IP address. Just as a side note, there is also the
“Do not Learn” bit set, which means that the remote switch should not learn the
source MAC address of the sender from the receives packet.
The
core routers in an Underlay Network forwards VXLAN encapsulated ICMP-Request
message based on the destination IP address of the outer IP header. When the
Edge Sw-2 receives the encapsulated ICMP message, it notifies that the outer
destination IP address is its own RLOC IP address. It processes the tunnel
headers. Based on the UDP destination port 4789, it knows that the next header
is the VXLAN header. From the VXLAN header VN-Id value 1000, it knows that this
packet belongs to tenant attached to VN-Id 1000. It verifies that the SGT value
77 is allowed to communicate with host-B, and then it switches the original
ICMP message to Host-B.
Figure 1-5: Campus Fabric – Data-Plane VXLAN Tunnel Encapsulation.
AWS VPC: Data-Plane Tunnel Encapsulation
Disclaimer! This section, including text
and figure 1-6, is not based on any AWS
VPC public documentation. It is 100%
based on the author’s own research and assumptions. Do not take this as fact
information.
Figure
1-6 illustrates the situation where EC2-A running in Host-A wants to
communicate with EC2-B launched in Host-B. Again, the MAC/IP address resolution
is left out. AWS VPC public documentation includes only high-level information
about the VPC tunneling encapsulation. However, there naturally is an outer
Ethernet header as well as the outer IP header. The outer source IP address in
the address of the sending physical host while the outer destination IP address
is the address of the physical host where the target EC2-B instance is running.
The public documentation does not include information which one TCP or UDP is used as a transport protocol.
Then there is a VPC header, which neither structure nor information carried
within, is not documented in a public AWS documentation. However, in order to
receive physical server Host-B is able to forward the ICMP-Request to EC2-B, it
should know (A) the VPC identifier used in this VPC, (B) the Elastic Network
Interface (ENI) Identifier attached to EC2-B and (C) the Security Group value.
Otherwise, it does not know to which VPC the message should be forwarded to, to
which ENI and is the connection allowed by the security policy. There might be
the same kind of “Do Not learn” information used to prevent address learning
from received tunneled data packets than what is used with VXLAN encapsulation,
as well as some reserved bits for future use. This, however, is the author's own
assumptions.
Figure 1-6: AWS VPC – Data-Plane VPC Tunnel Encapsulation.
This section
discusses of VXLAN and AWS VPC tunnel encapsulation. The next section describes
the processes of how this information is (A) published to Mapping-System and
(B) how the Edge Switches in Campus Fabric solution and the Software Routers in
physical hosts get that information.
Campus Fabric: LISP EID-to-RLOC Registration Process
Figure
1-7 illustrates the situation where Host-B boots up send a GARP message (L2
Broadcast) to verify the uniqueness of its IP address and inform the network of
its existence. Edge Sw-2 learns the MAC address of Host-B from the ingress GARP
message. This triggers the LISP EID-to-RLOC Registration process. It sends a
Map-Register message to Mapping System. The message contains the Mapping-Record
(MR) that describes the host IP address an Instance-Id, which together forms an
End-User Id (EID). The MR also includes additional information such as the
period of validity of this particular mapping. The message also carries
Locator-Record (LR) that describes the Remote Locator (RLOC) IP address of the
Edge Sw-2 that is the advertising last-hop-router. There is also information
related to security and redundancy among other things.
The
Mapping System has two functional components from the LISP perspective. The
Map-Server is the component that is responsible for saving received LISP
EID-to-RLOC Map-Register into Mapping-Database after a successful validity
check. Figure 1-7 also shows the EID-to-RLOC mapping entry in Mapping Systems
Mapping Database. It shows that the IP address 172.16.10.20/32 is registered by
192.168.10.2 (Edge Sw-2). Note that the example is done by using CSR1000v that
only supports IP service, the ethernet service is not supported.
Note!
The actual registration process is explained in detail in my book “LISP
Control-Plane in Campus Fabric” (Amazon and Leanpub).
Figure 1-7: Campus Fabric - LISP EID-to-RLOC Registration Process.
AWS VPC: EC2 Instance Registration Process
Disclaimer! This section, including text
and figure 1-8, is not based on any AWS
VPC public documentation. It is 100%
based on the author’s own research and assumptions on how the EC2 Instance
registration process might work. Do not take this as fact information. The
Mapping Service is totally transparent to AWS customers and there is no public documentation
available at the time of writing.
The
AWS VPC Data-Plane section describes the basic information used in the AWS VPC
tunneling. Based on the information needed to build a VPC tunnel encapsulated
packet, it can be assumed that this information has to also be registered to
Mapping Service by software router component running on a physical host and
published to other hosts by Mapping System itself. The registration can be
assumed to include information about the MAC and IP address bind to Elastic
Network Interface (ENI) that in turn is attached to EC2-B. In addition, there
has to be informed about the VPC Identifier and the IP address of the physical
host where EC2-B instance is running. It can also be assumed that there is some
security information related to message validity, the lifetime of the mapping
and so on. Figure 1-8 illustrates the registration message and the
Mapping-Service database entry in the same format that was used with the LISP
EID-to-RLOC Map-Register section.
Figure 1-8: AWS VPC – EC2 Instance Registration Process.
Campus Fabric: LISP EID-to-RLOC Map-Request Process
Figure
1-9 explains the LISP EID-to-RLOC Map-Request process. Host-A wants to
communicate the first time with Host-B so Host-A sends an ARP-Request message
(L2 Broadcast) to resolve IP/MAC address mapping of Host-B. When Edge Sw-1
receives the message it answers by sending an ARP-Reply using its own
distributed Anycast-MAC address. The first ingress ARP-Request also triggers a
LISP EID-to-RLOC Request process because Edge Sw-1 does not know how to forward
data to destination 172.16.10.20. Edge Sw-1 sends the Map-Request message to Mapping-System
where Edge Sw-1 asks what is the last-hop-router to host with the IP address
172.16.10.20. To be more specific, the message is sent to the Map-Resolver (MR)
component of Mapping-System. The Mapping-System makes a Mappin-Database lookup,
and since the Edge Sw-2 has published the information, the lookup result is
hit. The Map-Resolver then sends a Map-Reply message as an answer to
Map-Request to Edge Sw-1. Edge Sw-1 stores the EID-to-RLOC Mapping information
into LISP Mapping-Cache. When Host-A sends the next data packet, Edge Sw-1
forwards it based on the information found from Mapping-Cache. This is done by
encapsulating the original message inside VXLAN tunnel headers where the
destination IP address the RLOC address of Edge Sw-2 and the VN-Id is 1000.
Figure 1-9: Campus Fabric - LISP EID-to-RLOC Request Process.
AWS VPC: Instance-to-Server Map-Request Process
Figure
1-10 illustrates the assumed Instance-to-Location Map-Request process. The whole
Map-Request process follows the same principles as what was seen in the LISP map-request process described in the previous section. Software Router-1
answers to ARP-Request message and then request the Instance-to-Location
mapping information from Mapping Service (MS). Mapping Service knows the
requested mapping information because it has received registration from
Software Router-2. MS replies to Software Router-1 with the message where it
describes the location (physical server IP address) and its attached Elastic
Network Interface Identifier. (ENI-Id). Software Router-1 stores the mapping
information into the local cache database.
This process makes it possible
to deliver the Instance-to-Location mapping only for those locations where the
information is needed.
Figure 1-10: AWS VPC – EC2 Instance to Server Request Process.
Conclusion
From
the Data-Plane and Control-Plane perspective, the intra-subnet switching
process in both LISP based Campus Fabric and AWS VPC solutions are look alike.
Both solutions use some kind of centralized storage for location information that
is formed based on information received from last-hop-routers. Both solutions
also publish the mapping information only when asked, and the information is
stored locally into mapping cache by the requester to avoid unnecessary reoccurring
request processes. In addition, the Data-Plane on both solutions uses tunnel
encapsulation which carries tenant (VNI/VPC) information within encapsulation
headers. The Broadcast messages are also terminated into first-hop
routers/switches.
References
[RFC 6830] D.
Farinacci et al., “The Locator/ID Separation Protocol (LISP)”, RFC 6830,
January 2013.
[RFC 6833] V.
Fuller and D. Farinacci., “Locator/ID Separation Protocol (LISP) Map-Server
Interface”, RFC 6833, January 2013.
[LISP Control Plane] D. Farinacci et al., “The Locator/ID
Separation Protocol (LISP) Control Plane”, draft-ietf-lisp-rdc6833bis-25, June
16, 2019.
What
Is Amazon VPC?
https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html
AWS
re:Invent 2015 | (NET403) Another Day, Another Billion Packets
https://www.youtube.com/watch?v=3qln2u1Vr2E
AWS
re:Invent 2017: Another Day, Another Billion Flows (NET405)
https://www.youtube.com/watch?v=8gc2DgBqo9U
LISP Control-Plane in Campus Fabric
A Practical Guide to Understand the
Operation of Campus Fabric
Toni Pasanen, 17 February 2020,
ISBN-13: 979-8615059186
Well Compared. Keep sharing more and more AWS Online Training
ReplyDeleteThanks Balajee
ReplyDeleteyou are welcome
DeleteHow can I no more on networking
Delete