The case for Alternate Routing by: Randall Stewart 1/18/2004 This document will try to explain what alternate routing is in respect to SCTP, or for that matter, any transport protocol. So, first lets discuss what TCP or SCTP does when it attempts to send to an address. Well of course first and foremost it must develop a structure that will hold the address. In SCTP we call this a "net" structure. TCP holds this same information in its TCB (within the in_pcb) but of course there is only one destination. Along with where (i.e. the IP address) the peer is, we also have a pointer to a cached route. This cached route is kept so that you only need to lookup the route, with a call to rtalloc(), once. This can both save a lot of overhead and provide useful information to the transport (inside the route structure are things like the MTU etc). This all sounds fine so far, we cache a route in a structure that holds information about the peer, so why do we need something called an alternate route? And just what is it? Well, if both peers are singly homed, we probably do not need to even worry about alternate routes. After all we only have one default route out on both sides and one address. But lets expand here beyond the constraints placed on the routing system by thinking singly homed and TCP and look at a multi-homed scenario. For illustration purposes I will look at only one side of the association as multi-homed. With a little thought on your part you will see that this example will apply to even multi-homed endpoints on each side. So lets draw a scenario, based on my home (which is multi-homed) and a peer/co-worker of mine. So my machine is multi-homed and his is not. I have had some difficulty with inter-net providers and so to assure myself always having service (and test SCTP) I went and got two DSL's. Now what does a machine on my network look like: Host-X +---------+ | | | +----IP-1 <-----> DSL-Modem<----ATM to ISP-1 | | | +----IP-2 <----->PPPoE to Modem<-----ATM to ISP-2 | | +---------+ And so I have available two default routes route add default ISP-1-GW or route add default IPS-2-GW Now of course in standard FreeBSD (or NetBSD or OpenBSD for that matter) you can NOT have two default, you can in win-XP or Linux so I guess this is just one of the areas BSD is behind. But there is a catch (at least with linux last time I checked). If you have two defaults in linux, you only use the first one. Now I guess if your ethernet card died, and it went down, the OS would remove the route with the card failure (maybe) and then start using the other route. But you know what, in all the time I have had my home office I have never yet had a card fail (not that it cannot happen, I guess I have been lucky). On the other hand, off and on I have had all sorts of problems with my DSL. And you know what, it has usually been on the ATM side or a routing issue in the ISP that caused my packets to go on the floor. But there is hope for the BSD's. Itojun put in a neat little thing for routing in KAME called RADIX_MPATH. The idea is that when you put the BSD box as a router you might want multiple routes to one location and you may want to load share across these routes. I, myself, had made similar modifications to the radix tables that Itojun did, but I backed these out when I found out (from Itojun) that they existed. Note, yes this does mean that if you want to use the alternate route feature in my patch you must define RADIX_MPATH in your kernel configuration. So we have seen my side, let take a look at my peer and co-workers side. He does not have two network connections, he instead uses one path out via his cable modem. So he has HOST-Y +---------+ | | | +----IP-3 <-----> Cable-Modem<----out to ISP-3 | | +---------+ So he, somewhere has a route like route add default ISP-3-GW Now lets assume I have RADIX_MPATH turned on in my machine (since I do) and have entered my routes in. My friend and I have a text typing program that we run over SCTP and setup between our machines. So in the SCTP book speak we have: association = {[IP-1, IP-2 : 2222] [IP-3 : 4444]} Now just like with linux, if you define RADIX_MPATH in BSD you get the first route always. Assuming I put the routes in as shown above all of my traffic is headed out via ISP-1-GW. Everything is hunky dory though, we have only one destination and our packets are labeled with an IP header that as a source:IP-1 and destination IP-3. All is well, until my ISP-1 goes through one of its frequent problems and all of a sudden either my ATM goes down or some other disaster happens in their network. So does my machine know this? Nope not at all, it sees carrier just fine on the ether-net attached to the IP-1/ISP-1 link. So without any other help guess what happens? Our SCTP association fails and that is the end of our conversation. But why was IP-2/ISP-2 not used? Because our routing table, even though the information was there, provides the transport no way to update or change to an alternate route. The transport will quickly realize (long ahead of the routing system) that something is possibly wrong. But it can do nothing with this information. So this is where I decided to do something. I would expand the routing system to not only include rtalloc() and rtalloc1() but to also include rtalloc_alternate() The design goal of this new function was to provide a mechanism wherein the transport could give the routing system a destination and the current route to that destination and indicate that things were not right, and it would like to have another one if possible. Does this mean that the transport would have ANY idea that there was an alternate route or if such a thing exists, NO of course not. It is just a hook. This new function would exist in the routing system and if it could not find an alternate route at the same level in the routing tree as the one that the caller was using, it would return the same route again. Thus a transport would do something like: if(net->time-out-count > some_threshold) { new_route = rtalloc_alternate(net->dest_addr, net->route); if(new_route && ( new_route != net->route)) { /* we have an alternate, use it */ RT_FREE(net->route); net->route = new_route; net->src_addr_selected = 0; } } This would allow us to update our cached route with some other method of getting from here to there note that the transport really does not care where or how it just wants its packets to stand a chance of getting through. Now go back to my above described scenario and think about what would happen on say the second or third timeout (whenever we hit that some_threshold value). We would make a call down to this new routing code and out would pop the parallel route. Note that this route has to be at the same level of the tree, i.e. the code cannot return a less specific match or a more specific match. The code basically had to find the exact level of the routing tree that the existing route was cloned from and see if there are any alternates at that level, if so it could clone one of these and hand it back. Great idea right? But wait, thats a transport geek mucking around with the routing code, we can't allow that. So in order to get this in so you all could play with it I made it sctp specific i.e.: held within the SCTP code instead of where it belongs in the routing code in route.c So that is how the alternate route code came to be a configurable item i.e. option SCTP_ALTERNATE_ROUTE Take some time and think about the above scenarios. It gets even worse when you start having two multi-homed endpoints communicating with each other. But I will leave that as an exercise for you. So why are we taking this code out of the standard KAME SCTP release and making it into a patch that can be applied from the sctp.org web site? Well now it is time for us to attempt to get SCTP back into the main BSD releases. In order to get SCTP into the major releases of BSD this code needs to disappear. But I think that the refusal of the BSD community to accept the routing changes are invalid and I still want you to be able to have a full functioned SCTP that is error resilient. Maybe someday the BSD community will wake up and find out that they need this feature, when they are ready we will gladly supply these patches :-D Hey, even TCP could get some of the benefits of multi-homing if this code were brought into BSD. It could also do the alternate routing to recover from local ISP problems in a multi-homed environment. I guess that the BSD folks just have not thought about this in detail and need some more time to grasp some of the multi-homing concepts. Hmm I bet the linux community will get it first :-0 So for now, if you want to take advantage of multi-homing to the fullest, download the patch and untar it. In it you will find two files: alternate_route_patch apply_alt_route_patch.sh These must be in the same directory together and you must "cd" into that directory. Then do: ./apply_alt_route_patch.sh You will be prompted for the directory to where kame is. I would enter /usr/rrs/kame And the patch shell will add on kame/sys/netinet/ To the target of where it needs to go. To validate that you know the correct directory to give the patch you should be able to do ls /usr/rrs/kame/kame/sys/netinet/sctputil.c |-----+-----|------------+----| ^ ^ | | part you part that enter as patch supplies path. to patch program. Add the patch, and then rebuild your kernel. I think you will enjoy it :-> Contact me at mailto:randall@stewart.chicago.il.us if you have any questions. Best regards R