[RADIATOR] radiator Timeout handling

Fri Apr 15 16:25:58 CDT 2011

On 04/07/2011 01:38 AM, David Zych wrote:

Hello David,

it's about the time I get back to this :)

> The problem is that even with SQLRetries 1 and (let's say) Timeout 3,
> *subsequent* SQL-bound requests will still stall Radiator for 3 seconds
> each, so instead of being able to process many requests per second I can
> handle only one every 3 seconds, in perpetuity, even if the DB stays in
> this unresponsive state for many minutes or hours.

I agree. This is what happens currently if connect is successful but the
actual operation times out.

> The key to the backoff solution is that at some point Radiator
> temporarily gives up on sending queries to the DB at all, thereby
> enabling it to once again respond to other RADIUS requests in a timely
> manner.

Just to check: are you thinking about backing off because of timeout(s)
only? If the error is something else, then there would not be a backoff,
correct?

Just today I noticed an LDAP event where the connection had been idle
for a long while with HoldServerConnection enabled, and the next write
gave an error since a loadbalancer had decided the connection was timed
out. Altough this was LDAP, the same can happen with long living SQL
connections too.

> "right now each instance of SqlDb (e.g. for a particular AuthBySQL) has
> a single $self->{backoff_until} timer that applies collectively to all
> DBSources listed in the configuration clause of that instance".

Agree on this too.

> In keeping with the above, let me clarify my proposal:
> "1. within an instance of SqlDb, each configured DBSource gets its own
> individual backoff_until timer..."

I'd also say that should fix the timeout problem.

>>> 2. individual statement timeouts, such as the one in SqlDb::do(), could
>>> also set the backoff_until timer for the individual DBSource currently
>>> in use.  If this is judged not to be desirable in the general case, it
>>> could be controlled by a separate configuration parameter
>>> ("TimeoutBackoffTime", perhaps?).

The same FailureBackoffTime looks good to me.

>>> I'm half tempted to try to implement this myself, but I'm not confident
>>> that I fully understand all the potential repercussions for other parts
>>> of Radiator, and I know I'm not in a good position to test it thoroughly.

> P.S. If I were to attempt a patch for this, would you have the ability
> to easily test it to make sure it behaves well against various DB
> backends and doesn't break anything else?  :)  I don't think the code
> changes would be very difficult; it's the testing that worries me.

We could test this against some combinations. Do you think you could
somehow recreate the original "connect works - operation times out"
problem you had?

It's of course possible to force this problem within the code, but it
would not be the real deal. It should be quite close though.

If you do attempt creating a patch, we would be interested in taking a
look at the patched version and seeing how well it integrates with the
rest of the code.

Thanks!

-- 
Heikki Vatiainen <hvn at open.com.au>

Radiator: the most portable, flexible and configurable RADIUS server
anywhere. SQL, proxy, DBM, files, LDAP, NIS+, password, NT, Emerald,
Platypus, Freeside, TACACS+, PAM, external, Active Directory, EAP, TLS,
TTLS, PEAP, TNC, WiMAX, RSA, Vasco, Yubikey, MOTP, HOTP, TOTP,
DIAMETER etc. Full source on Unix, Windows, MacOSX, Solaris, VMS,
NetWare etc.