1) 2 levels of failure.
a) When you hit a Retranmist limit (set to say 1) you
switch to the alternate
b) When you hit a Max retransmit you mark the destination
as down.
c) If the HB sent to the place the t-o happened is sucessful
then you clear your count and the primary stays where it
was... Note you should send a HB right after a RTO on the
place where the RTO happened IMO
2) Always send T-O's to the alternate.. not FR's since Armandos
work shows this is not a good idea... and besides you don't
switch off due to one loss.. after all you still have an
ACK clock or you would not be FR'ing.
3) CMT, which is what Jana is working on, may be even a better
method assuming we can find shared bottlenecks.
R
Anatoly Khusid wrote:
>>If you need faster failover, you could change the
>>Path.Max.Retrans setting to limit the number of consecutive timeouts
>>that will trigger the failover.
>
>
> If you modify SCTP provisioning to be something other than the proposed
> defaults, I thought this might cause significant performance impacts?
>
> -----Original Message-----
> From: Ryan W Bickhart [mailto:bickhart@cis.udel.edu]
> Sent: Friday, February 11, 2005 3:23 PM
> To: David Lehmann
> Cc: sctp-impl@external.cisco.com
> Subject: Re: SCTP failover
>
>
> I'm not sure switching over to the alternate after seeing a single FR is
> the best approach for all cases though. Imagine a situation where the
> alternate is significantly slower or less desirable than the primary. A
> random FR on the primary may not necessarily be grounds for abandoning
> it right away. If you need faster failover, you could change the
> Path.Max.Retrans setting to limit the number of consecutive timeouts
> that will trigger the failover. Rather than 1 + 2 + ... + 32 = 63
> seconds, you could set PMR to 0 or 1 and fail over immediately or in
> about 1 second if desired. I think the suggested change to the wording
> in 6.4 is actually already covered by the ability to set the PMR.
>
> ---Ryan
>
>
> On Fri, 2005-02-11 at 14:33 -0500, David Lehmann wrote:
>
>>David Lehmann wrote:
>>
>>>Randall Stewart wrote:
>>>
>>>
>>>>In any event the only way to keep the network failover time
>>>>down is to set RTO.Max to a lower value.. that would make things
>>>>faster... To have a 1 second failover I would imagine that
>>>>Ulticom's stack is setting both RTO.Min and RTO.Max to a
>>>>lower value... aka that adds up to a total of 1 second..
>>>>I.e. something like 50ms RTO.Min and 400ms RTO.Max
>>>
>>>
>>>Nope. When the error count is greater than 0 on the
>>>primary destination, Ulticom uses an alternate destination,
>>>until the error is cleared on the primary destination.
>>>So, if the user is using the default values, the user
>>>will only see a blip for one second.
>>>(...assuming an RTO of 1 second)
>>
>>I need to correct my former post. Assuming a steady flow
>>of data chunks, the switch-over with Ulticom's stack will
>>actually be in a fraction of a second since the FR will
>>notice the problem long before the RTO.
>>
>>
>>>IMHO, waiting around for 1 minute while data trickles through
>>>the association is not acceptable and does not give the user
>>>of a sense of transparent network redundancy.
>>>
>>>IMHO, the language in section 6.4 should be changed from:
>>> By default, an endpoint SHOULD always transmit to the primary path,
>>> unless the SCTP user explicitly specifies the destination transport
>>> address (and possibly source transport address) to use.
>>>to something like:
>>> By default, an endpoint SHOULD always transmit to the primary path,
>>> unless the error count on the destination is greater than zero or the
>>> SCTP user explicitly specifies the destination transport address (and
>>> possibly source transport address) to use.
>>>
>>
>>
>
-- Randall Stewart ITD 803-345-0369 <or> 815-342-5222Received on Fri Feb 11 16:07:52 2005
This archive was generated by hypermail 2.1.8 : Mon Mar 13 2006 - 15:22:23 EST