[RADIATOR] failover SqlDB destinations

Michael ringo at vianet.ca
Wed Aug 17 19:50:55 CDT 2011



On 11-08-09 04:54 AM, Heikki Vatiainen wrote:
> On 08/05/2011 10:19 AM, Heikki Vatiainen wrote:
>
>>> Has anyone found any solutions/patches for the sql timeout failover
>>> issue with radiator?  When radiator executes an sql statement on an
>>> sql server that times out not on connecting, but the statement
>>> itself, radiator disconnects and reconnects to the same sql server to
>>> try again.  It never seems to failover to the next sql destination.
>>
>> Thanks for the problem description and the example code in your other
>> message. I will get back to this once I get the comments from the
>> development team.
>
> Can you provide a patch for this? That would make sure we have your
> version of the fix corretly understood.

I'm still testing/monitoring it to. So far, it will just alternate between the first 2 sql sources.  I have 4. I wanted to keep the 1st sql source preferred. My patch may not be a desired solution, but here it is:


--- Radiator-4.8+patches/Radius/SqlDb.pm	2011-04-27 17:21:51.000000000 -0400
+++ Radiator-4.8+patches+custom/Radius/SqlDb.pm	2011-08-11 09:10:50.000000000 -0400
@@ -121,6 +121,16 @@ sub initialize
      $self->{SQLRetries} = 2;
      $self->{FailureBackoffTime} = 600; # Seconds
      $self->{DateFormat} = '%b %e, %Y %H:%M'; # eg 'Sep 3, 1995 13:37'
+    $self->{DBCur}      = '-'; # keep track of the current (or if disconnected, previous) source.
+
+    $self->set("ConnectionHook",
+               'sub {
+                      my $self = shift;
+                      my $dbsource = ( split(/;/,$self->{dbname}) )[0];
+
+	              # If an sql connection occurs, log it so we can see it. Could use this to find out what sql server was in use, when a failure occurs.
+ 	              $self->log($main::LOG_WARNING, "SQL connected to DBSource: ($dbsource) [$self->{Identifier}]");
+                     }');
  
      $self->set("ConnectionAttemptFailedHook",
                 'sub {
@@ -170,6 +180,12 @@ sub reconnect
  	    $dbsource = &Radius::Util::format_special($dbsource, undef, $self);
  	    $dbusername = &Radius::Util::format_special($dbusername, undef, $self);
  	    $dbauth = &Radius::Util::format_special($dbauth, undef, $self);
+
+            # since reconnect always starts from the 1st DBSource, never reconnect to the 1st DBSource if the current/previous sql server (DBCur) matches.
+            # this should prevent always retrying the same server if an SQL timeout occurs, but the connection to the failing server succeeds.
+            next if $self->{DBCur} eq $dbsource;
+            $self->{DBCur} = $dbsource;
+
  	    $self->{dbname} = "$dbsource;$dbusername;$dbauth";
  	    return 1
  		if $Radius::SqlDb::handles{$self->{dbname}};


> There is also the question of possible problems with backwards
> compatibility. Currently Radiator does not advance to the next server if
> there's a timeout with the query. This change would extend the timeout
> behaviour from connections to queries too.
>
> Does anyone see problems with this? Should be made optional? Comments
> would be appreciated.
>
>> Can you tell why the problem occurred? Was the DB server having IO
>> problems? I'm just curious to know how this happens and how frequent the
>> problem might be.

Yes, I think it was I/O problems.  Not always, but a lot of times at 6:25am  (Debian Lenny), when the daily cron runs.  The timeout issue is not a Radiator problem.  It's an os/system/sql problem.  Only thing i was asking about, is if radiator should have a different response to an SQL timeout error.  Happens 2-3 times a day, but sometimes 0.



>
> Michael, do you have any comments on this?
>
>>> So, having multiple sql sources seems to be irrelevant with the issue
>>> of statement time outs.
>>
>> That is currently true.
>
yes, multiple sql sources is an irrelevance for sql timeout issues.  It will just constantly re-connect to the first sql source.



More information about the radiator mailing list