[RADIATOR] Infinite retries in AuthByLOADBALANCE

Hugh Irvine hugh at open.com.au
Thu Nov 5 21:37:26 UTC 2020


Hi Frank -

I know Heikki has been looking at related issues, so I’ll let him follow up with you.

cheers

Hugh


> On 6 Nov 2020, at 02:43, Frank Danielson <FDanielson at csky.com> wrote:
> 
> Hi Hugh-
> 
> I’m running an older version, 4.7 but did look at the code for AuthByLOADBALANCE and it does not seem to have changed in the latest version. If there’s been some other changes in the retry behavior in AuthByRADIUS we’ll schedule the update and see what happens. A very cursory look at the code seems that the underlying logic is the same, AuthByRADIUS depends on chooseHost() to return no host available and as long as it supplies one then the request will keep retrying. The one exception is that if all target hosts have been marked as down then the AuthByLOADBALANCE chooseHost() logs "ProxyAlgorithm LOADBALANCE Could not find a working host to proxy to” and the request stops retrying.
> 
> Regards-
> 
> <image002.png>
> 
> Frank Danielson | S.V.P. Engineering
> * fdanielson at csky.com
> 
>> On Nov 4, 2020, at 4:46 PM, Hugh Irvine <hugh at open.com.au> wrote:
>> 
>> 
>> Hi Frank -
>> 
>> What version of Radiator are you running currently?
>> 
>> Hugh
>> 
>> 
>>> On 5 Nov 2020, at 04:10, Frank Danielson <FDanielson at csky.com> wrote:
>>> 
>>> Good Day All-
>>> 
>>> We’ve been running AuthByLOADBALANCE for some time now and have noticed that if there is a message that does not get a response from the downstream hosts that it will be retried infinitely. This not only keeps the message around forever but as it is tried and failed, it increases the failure counts for the target hosts which makes them more likely to be marked unavailable and causes delivery problems with other requests.
>>> 
>>> For example a malformed request may be sent by an upstream client and handled by AuthByLOADBALANCE where the target hosts simply do not respond to the proxied request because they don’t like it. The request will be retried on the current host for Retries times by handle_timeout() after which the request is handed off to failed(), which tracks MaxFailedRequests for the host and marks it unavailable if applicable and then hands off the request to forward() which calls chooseHost() to find the next available host. The stock chooseHost() in AuthByRADIUS tracks if the request has reach the end of the list or not but chooseHost() in AuthByLOADBALANCE will always return a host if one is available and it could even be the same host as the last try if MaxFailedRequests has not been reached for that host. The end result is that the request will be retried forever and incrementing the failure count for downstream hosts, causing them to be marked unavailable. 
>>> 
>>> After some looking at the code I think I could override failed() to track the number of unique hosts to which a request has been forwarded with something like 
>>> 
>>> $fp->{retryHosts}->{$host}++
>>> 
>>> and then add a couple of checks in chooseHost() that are similar to the to original one-
>>> 
>>> if (@{$fp->{retryHosts}} < @{$self->{Hosts}}) 
>>> {
>>> foreach $host (@{$self->{Hosts}})
>>> {
>>>  next if ($fp->{retryHosts}->{$host})
>>>>>> 
>>> The end result being that the request will be tried for each host in the list Retries times and then the next best candidate chosen by the volume algorithm until all hosts are tried and then the request fails. That may not be the optimal behavior but it beats trying forever.
>>> 
>>> Before doing that and bearing the burden of maintaining a custom AuthBy I figured I’d send it to the list and see if someone else has already solved this problem or if Open Systems would be willing to revisit the AuthByLOADBALANCE logic. Perhaps changing the interpretation of Retries to mean the total number of times a request is retried instead of a per host number in order to have a finite lifetime on a request? In that case chooseHost() could be called for each retry in handle_timeout() to increase the chances of success.
>>> 
>>> Regards-
>>> 
>>> <image002.png>
>>> 
>>> Frank Danielson | S.V.P. Engineering
>>> * fdanielson at csky.com
>>> 
>>> _______________________________________________
>>> radiator mailing list
>>> radiator at lists.open.com.au
>>> https://lists.open.com.au/mailman/listinfo/radiator
>> 
>> 
>> --
>> 
>> Hugh Irvine
>> hugh at open.com.au
>> 
>> Radiator: the most portable, flexible and configurable RADIUS server 
>> anywhere. SQL, proxy, DBM, files, LDAP, NIS+, password, NT, Emerald, 
>> Platypus, Freeside, TACACS+, PAM, external, Active Directory, EAP, TLS, 
>> TTLS, PEAP, TNC, WiMAX, RSA, Vasco, Yubikey, MOTP, HOTP, TOTP,
>> DIAMETER, SIM, etc. 
>> Full source on Unix, Linux, Windows, macOS, Solaris, VMS, NetWare etc.
>> 
> 


--

Hugh Irvine
hugh at open.com.au

Radiator: the most portable, flexible and configurable RADIUS server 
anywhere. SQL, proxy, DBM, files, LDAP, NIS+, password, NT, Emerald, 
Platypus, Freeside, TACACS+, PAM, external, Active Directory, EAP, TLS, 
TTLS, PEAP, TNC, WiMAX, RSA, Vasco, Yubikey, MOTP, HOTP, TOTP,
DIAMETER, SIM, etc. 
Full source on Unix, Linux, Windows, macOS, Solaris, VMS, NetWare etc.



More information about the radiator mailing list