[RADIATOR] failover SqlDB destinations
Michael
ringo at vianet.ca
Wed Aug 17 19:50:55 CDT 2011
On 11-08-09 04:54 AM, Heikki Vatiainen wrote:
> On 08/05/2011 10:19 AM, Heikki Vatiainen wrote:
>
>>> Has anyone found any solutions/patches for the sql timeout failover
>>> issue with radiator? When radiator executes an sql statement on an
>>> sql server that times out not on connecting, but the statement
>>> itself, radiator disconnects and reconnects to the same sql server to
>>> try again. It never seems to failover to the next sql destination.
>>
>> Thanks for the problem description and the example code in your other
>> message. I will get back to this once I get the comments from the
>> development team.
>
> Can you provide a patch for this? That would make sure we have your
> version of the fix corretly understood.
I'm still testing/monitoring it to. So far, it will just alternate between the first 2 sql sources. I have 4. I wanted to keep the 1st sql source preferred. My patch may not be a desired solution, but here it is:
--- Radiator-4.8+patches/Radius/SqlDb.pm 2011-04-27 17:21:51.000000000 -0400
+++ Radiator-4.8+patches+custom/Radius/SqlDb.pm 2011-08-11 09:10:50.000000000 -0400
@@ -121,6 +121,16 @@ sub initialize
$self->{SQLRetries} = 2;
$self->{FailureBackoffTime} = 600; # Seconds
$self->{DateFormat} = '%b %e, %Y %H:%M'; # eg 'Sep 3, 1995 13:37'
+ $self->{DBCur} = '-'; # keep track of the current (or if disconnected, previous) source.
+
+ $self->set("ConnectionHook",
+ 'sub {
+ my $self = shift;
+ my $dbsource = ( split(/;/,$self->{dbname}) )[0];
+
+ # If an sql connection occurs, log it so we can see it. Could use this to find out what sql server was in use, when a failure occurs.
+ $self->log($main::LOG_WARNING, "SQL connected to DBSource: ($dbsource) [$self->{Identifier}]");
+ }');
$self->set("ConnectionAttemptFailedHook",
'sub {
@@ -170,6 +180,12 @@ sub reconnect
$dbsource = &Radius::Util::format_special($dbsource, undef, $self);
$dbusername = &Radius::Util::format_special($dbusername, undef, $self);
$dbauth = &Radius::Util::format_special($dbauth, undef, $self);
+
+ # since reconnect always starts from the 1st DBSource, never reconnect to the 1st DBSource if the current/previous sql server (DBCur) matches.
+ # this should prevent always retrying the same server if an SQL timeout occurs, but the connection to the failing server succeeds.
+ next if $self->{DBCur} eq $dbsource;
+ $self->{DBCur} = $dbsource;
+
$self->{dbname} = "$dbsource;$dbusername;$dbauth";
return 1
if $Radius::SqlDb::handles{$self->{dbname}};
> There is also the question of possible problems with backwards
> compatibility. Currently Radiator does not advance to the next server if
> there's a timeout with the query. This change would extend the timeout
> behaviour from connections to queries too.
>
> Does anyone see problems with this? Should be made optional? Comments
> would be appreciated.
>
>> Can you tell why the problem occurred? Was the DB server having IO
>> problems? I'm just curious to know how this happens and how frequent the
>> problem might be.
Yes, I think it was I/O problems. Not always, but a lot of times at 6:25am (Debian Lenny), when the daily cron runs. The timeout issue is not a Radiator problem. It's an os/system/sql problem. Only thing i was asking about, is if radiator should have a different response to an SQL timeout error. Happens 2-3 times a day, but sometimes 0.
>
> Michael, do you have any comments on this?
>
>>> So, having multiple sql sources seems to be irrelevant with the issue
>>> of statement time outs.
>>
>> That is currently true.
>
yes, multiple sql sources is an irrelevance for sql timeout issues. It will just constantly re-connect to the first sql source.
More information about the radiator
mailing list