Anycast: Networking Introduction

There are many methods of handling the requirement for a highly available system. I’d like to talk about one of the methods we use to distribute workload geographically: Anycast.

The first place to start is with an overview of packet networking, particularly IP networking. Unfortunately we have to start there because a lot of people have no actual idea how packet networking works—for systems administrators, it’s what you plug the cable into. For programmers, it’s an API you call that lets you stream data across the world (and nothing more).

Of course, that’s ignoring the details of how IP networking actually works, and while the devil is in the details, knowing the details lets you take advantage of nifty hacks. Just like knowing that an unsigned integer is the same as 32 booleans, knowing how packet networking works allows all sorts of nifty things.

Russian dolls

Packet networking works by co-operation: you have a file (or whatever) that you want to send from your computer, A to another computer, B via the Internet. Computer A (your computer) splits up the file into multiple chunks called packets, and puts the IP address of B into the packet. Then it puts this packet into another addressable chunk called a frame (frames use MAC addresses).

[Some comenters have correctly noted that this paragraph is wrong on modern networks—it actually just goes straight to the gateway. I’ll do a dedicated post on this behavior once I have the time/gear to test it.] Here’s where it gets interesting. If your computer doesn’t already know how to map B’s address to a MAC address, it will send out a special type of frame called an ARP query, which basically asks everyone on the local area network “who has this IP address?” If your computer doesn’t get a response, then it’ll look at your local routing table to find out where to send packets that it doesn’t know have a MAC (frame) address for for:

[root@computerA ~]# /sbin/ip route
192.168.22.0/26 dev eth0  proto kernel  scope link  src 192.168.22.8
169.254.0.0/16 dev eth0  scope link
default via 192.168.22.1 dev eth0

In this case, it’ll try to use the default route unless it has a better (that is, more-specific) route to B. So it needs to put the default router’s MAC address into the lower-layer/external frame’s destination address before it sends it out:

[root@computerA ~]# /sbin/arp -en
Address                  HWtype  HWaddress           Flags Mask            Iface
192.168.22.2             ether   00:23:AB:AF:80:00   C                     eth0
192.168.22.1             ether   00:00:0C:07:AC:DD   C                     eth0

So it addresses the frame (in this case) to the MAC (HW) address of the gateway (192.168.22.1 / 00:00:0C:07:AC:DD), and send it on its merry way. The frame leaves your computer with the packet inside, and gets to the gateway.

Now, the default route address is a network address on a router. A router is a computer with two or more network interfaces which knows how to take packets from one interface, and send them out another interface. The routing table on my wifi router at my house would have four entries (this isn’t my actual home router, just what it would look like):

[root@router ~]# /sbin/ip route
192.168.22.0/26 dev eth0  proto kernel  scope link  src 192.168.22.1
192.168.1.0/24 dev eth1  proto kernel  scope link  src 192.168.1.254
169.254.0.0/16 dev eth0  scope link
default via 192.168.1.1 dev eth1

As you can see, it knows about the same subnet as computer A, and also has its own default route to forward packets it receives but doesn’t know what to do with. At this point, the router will take the packet it received on eth0, replace the frame it came in, and send it out eth1 on to the next router (in this case, the cable modem).

Eventually, the packet will be handed around from one router to another, until it hits a router that has lots of interfaces. Usually this is what is sitting on the other end of your cable or DSL line. This box has thousands of routes in its routing table, costs hundreds of thousands of dollars, and has custom-designed hardware for taking packets from one network port and sending them out another.

All these routers co-operate to build a knowledge base of what the network looks like. The view each router has is often simplistic: a single upstream network interface with an upstream gateway, and a downstream LAN network for its locally connected hosts. Upstream of your wifi router, you’ve got the ISP’s interior routing—where your ISP’s routers talk to each other to figure out how to send your packet to computer B.

"I Love Loopholes"

If your packet needs to leave your ISP (let’s say you’re on Cable, and your friend with computer B is on DSL), then it goes into an even bigger router that has a view of some or all of the Internet, and has calculated the easiest way (that is, the one bouncing through the fewest number of companies) to send your packet to computer B.

Here’s the upshot of all this: each individual router has its own view of the network. It gets this information either statically configured by the network administrator, or it calculates it from information it receives from other routers. As it turns out, because of how routers co-operate to figure out who to send the packet towards next, there is a tremendous loophole that can be used for geographically distributed services.

Continued in:

  1. Anycast: Networking Introduction
  2. Anycast: The Loophole
  3. Anycast: The Interface
  4. Anycast: Handling Routes
  5. Anycast: DGRAM vs. STREAM
  6. Anycast: IP-SLA HOWTO
Advertisements

12 thoughts on “Anycast: Networking Introduction

  1. There is a tremendous loophole and then… and then… the post ends! =(

  2. Nice introduction, but you should be more careful of the basics. The IP stack of course looks up the route *before* ARPing for the next hop; otherwise you’d have to wait for the ARP to time out before establishing any connections not on the local net, which would make the web unbearable (not to mention the fun that someone could have by setting up their system to respond to, say, ARPs for http://www.google.com‘s IP).

  3. Paragraph 5 is incorrect with respect to ARP. The routing table will be consulted before sending out ARP packets. It does not wait for an ARP timeout before choosing the default route.

    The routing table tells whether it is direct (hence using ARP if it is Ethernet) or “via” another router (ARP is used to find the router’s MAC address).

  4. @camh: Has that changed? For the first packet after the ARP expiry, RHEL 5 (2.6.18) *does* send an ARP query for IPs not in the routing table before sending to the gateway.

  5. It will use ARP to find the gateway. It will not use ARP to try to find the destination host, time out, and then decide to use the gateway.

    Using the gateway is a routing decision and does not involve ARP.

    It’s been a long time since I’ve use a kernel as old as 2.6.18, but I doubt this behavior has changed. It makes no sense to send an ARP request for which you already know you will not get a response.

  6. @camh: You don’t actually know that you’re not going to get a response. There’s nothing preventing multiple IP subnets from existing on the same layer 2 domain, so it does actually make sense to send that ARP.

    Either way, you’re just guessing based on how you think it should work, and I’ve actually seen it work the way I’m claiming.

    Tell you what, fire up tshark, clear your arp cache, and watch what it actually does.

  7. If multiple IP subnets share a layer two network, the traffic should “hairpin” through the default route. My CentOS 5 box on a shared L2 network does not install an ARP entry for any address not covered by it’s own network+netmask. A traceroute to a host on the same L2 network but a different subnet confirms that it does in fact hop through the router.

  8. James, I’m not sure what you mean by “when you have an empty cache.” If I have an empty ARP cache, then of course if I try to connect to any system off the local net, the stack will need to ARP for the next hop, ie the router (usually the default gateway). But there is absolutely not an ARP for the remote destination.

    You could try it yourself — do something like “tcpdump ether broadcast” while you browse to some random website that you haven’t been to for a long time and that won’t be in your ARP cache. You’ll observe two things: first, there is no ARP for the web site’s address on the wire (although as noted before there might be an ARP for the next hop as looked up in the route table); and second, there couldn’t possibly be an ARP that times out, because the web site starts loading before an ARP could time out.

    You comment about multiple IP nets on the same L2 is not relevant; for that to work any system using multiple nets would have to have routes to all the networks.

  9. Hmmm. I just tested this again on a DHCP-IP’d host on a 1Gbps link, and it does only ARP the gateway.

    I’ll try to reproduce my original experience, but I do recall having to hardcode the MAC of a destination host into the ARP table to prevent a query for it in at least one precise scenario.

    Now I just need to find a pair of 10Mbps hubs and a couple of contiguous hours… 😉

Comments are closed.