[RADIATOR] radiator Timeout handling

David Zych dmrz at illinois.edu
Wed Apr 6 17:38:26 CDT 2011


On 4/6/2011 4:26 PM, Heikki Vatiainen wrote:
> On 04/06/2011 10:22 PM, David Zych wrote:
>> I just ran into this same problem; my DB got into a state where
>> DBI->connect was working fine but actual INSERTs were timing out, and
>> the non-observance of FailureBackoffTime in this situation resulted in
>> both of my RADIUS servers being effectively stalled for 10 minutes (one
>> INSERT Timeout at a time) until the DB issue was resolved.
> 
> Patches for 4.7 has this:
> 
> 2010-09-21 SqlDb.pm
>     Added SQLRetries parameter to all SQL type clauses. When executing a
> query, Radiator will try up to SQLRetries attempts to execute the query,
> retrying if certain types of SQL error are seen. Defaults to 2.
> Requested by Michael.
> 
> Would this be helpful? You can also control the timeout with a
> parameter, see ref.pdf "5.29.4 Timeout", unless you platform is Win32.
> 
> So if you set retries to 1 and Timeout to a small number of seconds,
> would this be a solution? SQLRetries works with Win32 too.

The problem is that even with SQLRetries 1 and (let's say) Timeout 3,
*subsequent* SQL-bound requests will still stall Radiator for 3 seconds
each, so instead of being able to process many requests per second I can
handle only one every 3 seconds, in perpetuity, even if the DB stays in
this unresponsive state for many minutes or hours.

The key to the backoff solution is that at some point Radiator
temporarily gives up on sending queries to the DB at all, thereby
enabling it to once again respond to other RADIUS requests in a timely
manner.

>> I would like to second Michael's request for a way to alter this behavior.
>>
>> It appears that right now SqlDb.pm has a single $self->{backoff_until}
>> timer that applies collectively to all configured DBSources (i.e. it is
>> set only when all DBSources fail DBI->connect in sequence, and when set
>> it causes none of them to be tried again in reconnect() until the set
>> time).  Would it perhaps make more sense that:
> 
> A couple of notes about what is shared.
> 
> SqlDB.pm uses $self->{backoff_until} but $self varies. In other words,
> if you have a LogSQL, an AuthLogSQL and an AuthBySQL they all inhert
> from SqlDb.pm, have different $self and for that reason different
> $self->{backoff_until} too.
> 
> What is shared are the prepared statement handles and database handles.
> If you have the same "$dbsource;$dbusername;$dbauth" for e.g. LogSQL and
> AuthBySQL then you are using the same handle for both.

That makes sense and is good to know; I suppose what I should have said was:

"right now each instance of SqlDb (e.g. for a particular AuthBySQL) has
a single $self->{backoff_until} timer that applies collectively to all
DBSources listed in the configuration clause of that instance".

(note: in my particular case the AuthBySQL only has one DBSource so this
distinction doesn't matter, I'm just trying to propose a solution that
works in the general case)

>> 1. each configured DBSource gets its own individual backoff_until timer
>> that is set when that DBSource fails DBI->connect, and when set causes
>> that DBSource to be skipped in reconnect() until the set time.

In keeping with the above, let me clarify my proposal:
"1. within an instance of SqlDb, each configured DBSource gets its own
individual backoff_until timer..."

>> 2. individual statement timeouts, such as the one in SqlDb::do(), could
>> also set the backoff_until timer for the individual DBSource currently
>> in use.  If this is judged not to be desirable in the general case, it
>> could be controlled by a separate configuration parameter
>> ("TimeoutBackoffTime", perhaps?).
>>
>> I'm half tempted to try to implement this myself, but I'm not confident
>> that I fully understand all the potential repercussions for other parts
>> of Radiator, and I know I'm not in a good position to test it thoroughly.
> 
> We'll take a look at your comments in more detail. If you plan to
> implement the changes, please let us know of your results.

Thanks, Heikki.

David

P.S. If I were to attempt a patch for this, would you have the ability
to easily test it to make sure it behaves well against various DB
backends and doesn't break anything else?  :)  I don't think the code
changes would be very difficult; it's the testing that worries me.


More information about the radiator mailing list