I was not completely right that during this 60 second interval my peer does
not receive any messages.
I ran the test again, and noticed that peer DOES receive messages BUT at a
very slow rate (just like you suspected that only retransmissions are sent
over the alternative interface).
So the transmission for each message must fail first, before LKSCTP tries an
alternative interface.
This causes significant delays for data transmission during this 60 seconds
interval(RTO.MAX). I wonder if this could be optimized?
Thanks,
Anatoly
-----Original Message-----
From: Randall Stewart [mailto:rrs@cisco.com]
Sent: Friday, February 11, 2005 1:08 PM
To: Anatoly Khusid
Cc: sctp-impl@external.cisco.com
Subject: Re: SCTP failover
Anatoly:
Shridhar can probably shed more light on this than I .. but
I have a couple of ideas.
In theory the sctp stack SHOULD send retransmissions to the
alternate address.. I would think this is NOT happening.. Even
if it does happen for some time you will still have some
delays in messages... Think of it this way
--msg-1--->
--msg---->
--msg-n-->
T.O (1sec)
--rsend- msg1-n to alternate
--msg-n+1-->
--msg-n+..->
--msg-n+m-->
T.O (2sec)
--resend msg(n-m) to alternate
--msg+m-1-->
etc..
until you get
TO's
1sec
2sec
4sec
8sec
16sec
32sec
--- or about 63 seconds after failure the primary will be delcared dead and new transmissions will go to the alternate (which is now the primary). In between the break and the 6 timeouts, messages will be delayed anywhere from 1 - 32 seconds .. but they should still get through. The fact that you don't see any messages for 60 or so second indicates to me that maybe the retransmit to alternate is not working in lk-sctp, or maybe there is an option to turn it on??Hi In any event the only way to keep the network failover time down is to set RTO.Max to a lower value.. that would make things faster... To have a 1 second failover I would imagine that Ulticom's stack is setting both RTO.Min and RTO.Max to a lower value... aka that adds up to a total of 1 second.. I.e. something like 50ms RTO.Min and 400ms RTO.Max Now, as discused on the tsv, when you do this you need to make sure the receiver is also cranking down its delayed sack timer to be smaller than RTO.Min .. otherwise you are going to get T3-Timeouts on normal sack delay when only one TSN is sent and there is nothing else to send.. Hope that helps.. R Anatoly Khusid wrote: > > Hello, > > I am using Linux SCTP implementation (LKSCTP) SLES9 (2.6.5-7.111.19-smp) > distribution. > I have a client application that is sending data to a server on a remote > machine. The machines are connected over two private LANs. > When I disconnect a primary interface, I expect SCTP to start using an > alternative LAN as soon as possible. Well, it takes about one minute for > LKSCTP to detect that LAN is down, before it starts to transmit data > messages on another LAN. I don’t have any data messages lost, but I have a > 1-minute delay during which time no data is received by a server, after > about one minute, the data transmission resumes through the alternative > interface. > I am curious why it takes so long to detect a LAN failure? I am using all > the defaults for SCTP provisioning. (In fact I used getsockopt() to verify > that the defaults match SCTP specs). > Based on this section in SCTP RFC I would expect the switchover to be in a > matter of seconds the most. I am using Ulticom’s SCTP implementation and > the switchover only takes about 1 second. Could anyone please shed some > light on this? > > Section 6.4 of SCTP RFC 2960: > Furthermore, when its peer is multi-homed, an endpoint SHOULD try to > retransmit a chunk to an active destination transport address that is > different from the last destination address to which the DATA chunk was > sent. > > Thanks, > > Anatoly Khusid > Ulticom Inc. > Senior Software Engineer > -- Randall Stewart ITD 803-345-0369 <or> 815-342-5222Received on Fri Feb 11 15:55:56 2005
This archive was generated by hypermail 2.1.8 : Mon Mar 13 2006 - 15:22:23 EST