[RADIATOR] RADSEC, failure algorithm, eduroaming and long reply times

Thu Apr 5 08:07:04 CDT 2012

Hi Heikki,

thanks for your fast reply!  The radiator team is great!

Am 05.04.2012 14:00, schrieb Heikki Vatiainen:
> On 04/03/2012 07:45 PM, Karl Gaissmaier wrote:
>
> Hello Charly,
>
>> I've a problem with AuthBy RADSEC and the failure algorithm.
>>
>> In the eduroam confederation it's nearly impossible to find proper
>> values for NoreplyTimeout, MaxFailedRequests, ... for Access-Requests.
>>
>> It takes sometimes many seconds (till 60s or more) to get an reply
>> for an Access-Request.
>>
>> It would be much better to use Status-Requests against the server to
>> determine if the next server is dead and not an timeout for proxied
>> Access-Requests down the lane.
>
> So Status-Server would be used instead of NoReplyTimeout and
> MaxFailedRequests? If Status-Server gets no response, then the server
> would be probed infrequently to see when it gets back. Meanwhile the
> requests would be forwarded to the secondary server?

Please see my config chunk:

<AuthBy RADSEC>

     Host                        radius1.dfn.de
     Host                        radius2.dfn.de

     FailureBackoffTime          2
     MaxFailedRequests           1
     NoreplyTimeout              45

     UseTLS
     # TLS specific cfg follows

</AuthBy>

I've increased NoreplyTimeout to a big number, since the
reply time - Accept or Reject - can't be estimated nor calculated
for the whole Radius Server Hops involved in eduroam.

If only one organization has a misbehaving radius server with long
delays the failure algorithm used with RADIATOR stops all Eduroam
proxying from my organization to my up level NREN radius servers,
even if they are proper responding.

The mistake is, that the NoreplyTimeout is used to determine if the
next radius server is down. The NoreplyTimeout says something about
the health of the last radius server for this realm, but the failure algo.
kills the running connection to my next-hop NREN server.

Thats fatal.

This failure algorithm is only useful, if the next-hop radius
server is at the end in the proxy chain.
It is totally useless and harmful in the eduroam radius chain.

> Would the current behaviour of returning nothing (IGNORE) to the
> previous server still be fine?

Hm, I didn't catch you?

My suggestion would be and additional Parameter like CheckServerStatus:

<AuthBy RADSEC>

     Host                        radius1.dfn.de
     Host                        radius2.dfn.de

     PollServerStatus            on             # use Server-Status Requests
     Pollfrequency               5              # any 5s
     FailureBackoffTime          2

     ...
</AuthBy>

>
> Another alternative would be to synthesise an Access-Reject to the
> previous server.

I don't understand what you mean with 'previous server'.
Maybe I'm wrong, but the current failure algo is broken for
a radius chain and only useful between peers.

>
> Related to this, what is the current view of using Dead Realm Marking in
> eduroam?

Maybe it's useful for NREN radius servers and not for universities.
And by the way it's the wrong end to solve the problem.
We need an algorithm to check the status of the next-hop
radius server and not indirect via a reply timeout from a totally
different server down the proxy lane.

Maybe I can't explain the problem due to my bad english, but please
try to solve this problem in correspondence with me, since it's a real
problem. Please see in your history, I don't claim often and when I claim
it's normally a real world problem.

Best Regards
	Charly
-- 
Karl Gaissmaier
Kommunikations und Informationszentrum kiz
der Universität Ulm
Abteilung Infrastruktur
SG Netzwerk und Telekommunikation
89069 Ulm
Tel.: 49(0)731/50-22499 Fax : 49(0)731/50-1222499