[RADIATOR] radiator Timeout handling

Michael ringo at vianet.ca
Mon Sep 20 09:07:20 CDT 2010


ok, thanks.  Can I suggest an option to disable this behavior? In my 
case, I would prefer radiator to only allow one timeout, when a timeout 
occurs, respect the FailureBackoffTime. If it doesn't, radiator creates 
a very undesirable situation when it continues to try for every packet, 
the sql server that timed out. It basically bottlenecks my whole radius 
system since all radiator servers connect to the same accounting mysql 
server, and all nas's eventually mark each radius server "RADIUS_DEAD" 
and then all authentication seems to stop.

Mike



On 10-09-16 08:27 PM, Hugh Irvine wrote:
> Hello Michael -
>
> The behaviour you observe is in fact what the code does - the manual does not correctly describe this behaviour.
>
> The manual has been amended for the next release.
>
> Thanks for letting us know.
>
> regards
>
> Hugh
>
>
> On 16 Sep 2010, at 15:31, Hugh Irvine wrote:
>
>    
>> Hello Michael -
>>
>> We'll investigate this a bit further.
>>
>> BTW - some people put a unique index on the accounting table using something like Acct-Session-Id + Timestamp to avoid duplicates.
>>
>> Otherwise you can use a stored procedure in the database to do whatever you need (or not).
>>
>> regards
>>
>> Hugh
>>
>>
>>
>> On 16 Sep 2010, at 14:43, Michael wrote:
>>
>>      
>>> Debian 4.0
>>> Perl, v5.8.8 built for x86_64-linux-gnu-thread-multi
>>> DBI v1.53
>>> mysqld Ver 5.0.32-Debian_7etch12-log for pc-linux-gnu on x86_64 (Debian etch distribution)
>>>
>>>
>>> since my radiator config is quite lengthy, i'll just post what i think is the relevant parts.  if you would like to know more, let me know.  There are also a few very custom things I do to identify what type of service the connection is coming from, handling of Acct Gigawords and Acct Octets, etc...  so, there's going to be a few things in the debug/config that you wont recognize.
>>>
>>> <Client>  section allows specific NAS's
>>>
>>> #All accounting packets for all NAS'S handled here:
>>> <Handler Request-Type = "Accounting-Request", \
>>>        Acct-Status-Type = Start|Stop>
>>>        Identifier accounting
>>>
>>>        SessionDatabase sessdb
>>>
>>>        PreProcessingHook file:"%D/conf/hook.PreProcessingHook"
>>>
>>>        AuthByPolicy
>>>        AuthBy SQL-1-Accounting
>>> </Handler>
>>> <AuthBy SQL>
>>>
>>>        Identifier      SQL-1-Accounting
>>>
>>>        DBSource        dbi:mysql:yyy:yyy
>>>        DBUsername      yyy
>>>        DBAuth          yyy
>>>
>>>
>>>        Timeout       3
>>>        FailureBackoffTime    300
>>>
>>>
>>>        # a few attributes added to radius packet via previous Hook file.
>>>        AccountingTable `acct`
>>>        AcctColumnDef   `submitted_un`,%u,formatted
>>>        AcctColumnDef   `auth_un`,auth-un
>>>        AcctColumnDef   `zone`,zone
>>>        AcctColumnDef   `uid`,uid
>>>        AcctColumnDef   `timestamp`,Timestamp
>>>        AcctColumnDef   `type`,Acct-Status-Type
>>>        AcctColumnDef   `acct_delay_time`,Acct-Delay-Time
>>>        AcctColumnDef   `upload`,total-input-octets
>>>        AcctColumnDef   `download`,total-output-octets
>>>        AcctColumnDef   `sess_id`,Acct-Session-Id
>>>        AcctColumnDef   `sess_time`,Acct-Session-Time
>>>        AcctColumnDef   `term_cause`,Acct-Terminate-Cause
>>>        AcctColumnDef   `nas_ip_address`,NAS-IP-Address
>>>        AcctColumnDef   `ip_address`,Framed-IP-Address
>>>        AcctColumnDef   `service`,service
>>>
>>>        AcctFailedLogFileName %L/accounting-missed/logfile.sql-failed.acct.%Y%m%d
>>> </AuthBy>
>>>
>>>
>>>
>>>
>>> Thu Sep 16 15:19:48 2010: DEBUG: Packet dump:
>>> *** Received from 192.168.100.1 port 1646 ....
>>> Code:       Accounting-Request
>>> Identifier: 75
>>> Authentic:<133><139><188><253><225><157><240>[Lo<175><236><127><144><161><31>
>>> Attributes:
>>>    Acct-Session-Id = "00000620"
>>>    Framed-Protocol = PPP
>>>    Framed-IP-Address = 192.168.100.100
>>>    User-Name = "dsltest"
>>>    Acct-Authentic = RADIUS
>>>    Acct-Status-Type = Start
>>>    NAS-Port-Type = Ethernet
>>>    NAS-Port = 100
>>>    NAS-Port-Id = "0/0/0/431"
>>>    Class = "FED,100,dsltest"
>>>    Service-Type = Framed-User
>>>    NAS-IP-Address = 192.168.100.1
>>>    Ascend-Session-Svr-Key = "AA39FB66"
>>>    NAS-Identifier = "TEST-LNS-1"
>>>    Acct-Delay-Time = 0
>>>
>>> Thu Sep 16 15:19:48 2010: DEBUG: hook.service-label: finding service-label for NAS-Port=(100)
>>> Thu Sep 16 15:19:48 2010: DEBUG: hook.service-label: service-label=(testing)
>>> Thu Sep 16 15:19:48 2010: DEBUG: hook.service-label: add_attr(service=testing)
>>> Thu Sep 16 15:19:48 2010: DEBUG: Handling request with Handler 'Request-Type = "Accounting-Request", Acct-Status-Type = Start|Stop', Identifier 'accounting'
>>> Thu Sep 16 15:19:48 2010: DEBUG: hook.PreProcessingHook: executing.
>>> Thu Sep 16 15:19:48 2010: DEBUG: hook.PreProcessingHook: changed Acct-Status-Type Start to login.
>>> Thu Sep 16 15:19:48 2010: DEBUG: hook.PreProcessingHook: loading attrs from Class.
>>> Thu Sep 16 15:19:48 2010: DEBUG: Handling with Radius::AuthSQL: SQL-1-Accounting
>>> Thu Sep 16 15:19:48 2010: DEBUG: Handling accounting with Radius::AuthSQL
>>> Thu Sep 16 15:19:48 2010: DEBUG: do query is: 'insert into `acct` (`acct_delay_time`,`auth_un`,`ip_address`,`nas_ip_address`,`service`,`sess_id`,`submitted_un`,`timestamp`,`type`,`uid`,`zone`) values ('0','dsltest','192.168.100.100','192.168.100.1','testing','00000620','dsltest','1284664788','login','100','FED')':
>>> Thu Sep 16 15:19:51 2010: ERR: do failed for 'insert into `acct` (`acct_delay_time`,`auth_un`,`ip_address`,`nas_ip_address`,`service`,`sess_id`,`submitted_un`,`timestamp`,`type`,`uid`,`zone`) values ('0','dsltest','192.168.100.100','192.168.100.1','testing','00000620','dsltest','1284664788','login','100','FED')': SQL Timeout
>>> Thu Sep 16 15:19:54 2010: ERR: do failed for 'insert into `acct` (`acct_delay_time`,`auth_un`,`ip_address`,`nas_ip_address`,`service`,`sess_id`,`submitted_un`,`timestamp`,`type`,`uid`,`zone`) values ('0','dsltest','192.168.100.100','192.168.100.1','testing','00000620','dsltest','1284664788','login','100','FED')': SQL Timeout
>>> Thu Sep 16 15:19:54 2010: DEBUG: AuthBy SQL result: IGNORE, Database failure
>>>
>>> ### timeout value 3 * 2 queries = 6 seconds and 2 possible accounting duplicates.
>>> ### omitted here is the exact same debug lines from another Start packet from the NAS since it timed out and retransmitted.
>>>
>>>
>>> ### here's the Stop packet coming through trying an insert again, not obeying FailureBackoffTime.
>>>
>>> *** Received from 192.168.100.1 port 1646 ....
>>> Code:       Accounting-Request
>>> Identifier: 76
>>> Authentic:<130><28><130><231>C<233><2><135><30><142><206><148><156>T~<160>
>>> Attributes:
>>>    Acct-Session-Id = "00000620"
>>>    Framed-Protocol = PPP
>>>    Framed-IP-Address = 192.168.100.100
>>>    User-Name = "dsltest"
>>>    Acct-Authentic = RADIUS
>>>    Acct-Session-Time = 4
>>>    Acct-Input-Octets = 880
>>>    Acct-Output-Octets = 116
>>>    Acct-Input-Packets = 12
>>>    Acct-Output-Packets = 7
>>>    Acct-Terminate-Cause = User-Request
>>>    Acct-Status-Type = Stop
>>>    NAS-Port-Type = Ethernet
>>>    NAS-Port = 100
>>>    NAS-Port-Id = "0/0/0/431"
>>>    Class = "FED,100,dsltest"
>>>    Service-Type = Framed-User
>>>    NAS-IP-Address = 192.168.100.1
>>>    Ascend-Session-Svr-Key = "AA39FB66"
>>>    NAS-Identifier = "TEST-LNS-1"
>>>    Acct-Delay-Time = 0
>>>
>>> Thu Sep 16 15:19:54 2010: DEBUG: hook.service-label: finding service-label for NAS-Port=(100)
>>> Thu Sep 16 15:19:54 2010: DEBUG: hook.service-label: service-label=(testing)
>>> Thu Sep 16 15:19:54 2010: DEBUG: hook.service-label: add_attr(service=testing)
>>> Thu Sep 16 15:19:54 2010: DEBUG: Handling request with Handler 'Request-Type = "Accounting-Request", Acct-Status-Type = Start|Stop', Identifier 'accounting'
>>> Thu Sep 16 15:19:54 2010: DEBUG: hook.PreProcessingHook: executing.
>>> Thu Sep 16 15:19:54 2010: DEBUG: hook.PreProcessingHook: changed Acct-Status-Type Stop to logout.
>>> Thu Sep 16 15:19:54 2010: DEBUG: hook.PreProcessingHook: loading attrs from Class.
>>> Thu Sep 16 15:19:54 2010: DEBUG: hook.PreProcessingHook: calculating total-output-octets and total-input-octets.
>>> Thu Sep 16 15:19:54 2010: DEBUG: Handling with Radius::AuthSQL: SQL-1-Accounting
>>> Thu Sep 16 15:19:54 2010: DEBUG: Handling accounting with Radius::AuthSQL
>>> Thu Sep 16 15:19:54 2010: DEBUG: do query is: 'insert into `acct` (`acct_delay_time`,`auth_un`,`download`,`ip_address`,`nas_ip_address`,`service`,`sess_id`,`sess_time`,`submitted_un`,`term_cause`,`timestamp`,`type`,`uid`,`upload`,`zone`) values ('0','dsltest','116','192.168.100.100','192.168.100.1','testing','00000620','4','dsltest','User-Request','1284664794','logout','100','880','FED')':
>>> Thu Sep 16 15:19:57 2010: ERR: do failed for 'insert into `acct` (`acct_delay_time`,`auth_un`,`download`,`ip_address`,`nas_ip_address`,`service`,`sess_id`,`sess_time`,`submitted_un`,`term_cause`,`timestamp`,`type`,`uid`,`upload`,`zone`) values ('0','dsltest','116','192.168.100.100','192.168.100.1','testing','00000620','4','dsltest','User-Request','1284664794','logout','100','880','FED')': SQL Timeout
>>> Thu Sep 16 15:20:00 2010: ERR: do failed for 'insert into `acct` (`acct_delay_time`,`auth_un`,`download`,`ip_address`,`nas_ip_address`,`service`,`sess_id`,`sess_time`,`submitted_un`,`term_cause`,`timestamp`,`type`,`uid`,`upload`,`zone`) values ('0','dsltest','116','192.168.100.100','192.168.100.1','testing','00000620','4','dsltest','User-Request','1284664794','logout','100','880','FED')': SQL Timeout
>>> Thu Sep 16 15:20:00 2010: DEBUG: AuthBy SQL result: IGNORE, Database failure
>>>
>>> ### omitted here is the exact same debug lines from another Stop packet from the NAS since it timed out and retransmitted.
>>>
>>>
>>>
>>>
>>> On 10-09-16 03:00 PM, Hugh Irvine wrote:
>>>        
>>>> Hello Michael -
>>>>
>>>> We'll need to see a copy of the configuration file (no secrets), together with a more complete trace 4 debug showing what is happening.
>>>>
>>>> We will also need to know what hardware/software platform you are running on, what version of Perl, what version of DBI/DBD, what SQL database, and anything else that might be useful.
>>>>
>>>> regards
>>>>
>>>> Hugh
>>>>
>>>>
>>>> On 16 Sep 2010, at 13:33, Michael wrote:
>>>>
>>>>          
>>>>> Hi,
>>>>>
>>>>> I'm having a couple issues with<AuthBy SQL>. Maybe it would be considered a bug i'm not sure.
>>>>>
>>>>>
>>>>> 1. the Timeout handling.
>>>>> ------------------------
>>>>>            
>>>>>>  From my testing, it appears that radiator times out at this value, but seems to retry the sql query a second time, creating in another timeout count.
>>>>>>              
>>>>> eg debug:
>>>>> Tue Sep 14 12:48:21 2010: DEBUG: Handling accounting with Radius::AuthSQL
>>>>> Tue Sep 14 12:48:21 2010: DEBUG: do query is: 'insert into `acct`<snip>
>>>>> Tue Sep 14 12:48:25 2010: ERR: do failed for 'insert into `acct`<snip>   SQL Timeout
>>>>> Tue Sep 14 12:48:29 2010: ERR: do failed for 'insert into `acct`<snip>   SQL Timeout
>>>>> Tue Sep 14 12:48:29 2010: DEBUG: AuthBy SQL result: IGNORE, Database failure
>>>>>
>>>>> Timeout is set for 4 seconds...
>>>>> so, query executed at 12:48:21, ERR timed out 4 seconds later, appeared to re-try but didn't say anything, and another ERR timeout 4 seconds after that.  That's 8 seconds of course.  It doubles the Timeout value.
>>>>>
>>>>> This is no good, for me.  If I set my SQL timeout value for 4 seconds, and my NAS timeout for 5 seconds, I expect my radiator to timeout before my NAS re-transmits.  my NAS will retry after 5 seconds because radiator hasn't responded.  And, radiator hasn't obeyed the timeout so it's still waiting for 8 seconds.  This causes the same accounting packet to enter radiator again, and causing another 8 seconds delay, and of course duplicate entries in the accounting logging since I'm also using AcctFailedLogFileName so the packet will eventually end up in the SQL table.
>>>>>
>>>>>
>>>>> 2. SQL Timeout issue #2.
>>>>> ------------------------
>>>>> using the same debug example above, when the SQL query times out, it doesn't seem to use the FailureBackoffTime value. It only seems to use FailureBackoffTime when there is a connection failure, not a timeout.  So, every query is still presented to the SQL server.  If the timeout is due to lets say a write lock, when the lock releases, all the queued insert statements are executed creating in sometimes up to 10 duplicate accounting entries.
>>>>>
>>>>>
>>>>> 3. AuthBy result after failure
>>>>> ------------------------------
>>>>> an IGNORE occurs when the SQL query fails, but if AcctFailedLogFileName is used, and successfully wrote the accounting packet to the file, should radiator not then reply with an ACCEPT?  Dumping the accounting packet to a file, but replying with IGNORE will cause the NAS to go to the next radius server possibly accepting the entry, therefore the packet in the acct failed logfile, is a double.  Even if there's only 1 radiator server in play, there will be an accounting packet entry in the failed log for each time the NAS retries.
>>>>>
>>>>>
>>>>> Thanks in advanced for any advice,
>>>>> Michael
>>>>>
>>>>> _______________________________________________
>>>>> radiator mailing list
>>>>> radiator at open.com.au
>>>>> http://www.open.com.au/mailman/listinfo/radiator
>>>>>            
>>>>
>>>>
>>>> NB:
>>>>
>>>> Have you read the reference manual ("doc/ref.html")?
>>>> Have you searched the mailing list archive (www.open.com.au/archives/radiator)?
>>>> Have you had a quick look on Google (www.google.com)?
>>>> Have you included a copy of your configuration file (no secrets),
>>>> together with a trace 4 debug showing what is happening?
>>>>
>>>>          
>>>        
>>
>>
>> NB:
>>
>> Have you read the reference manual ("doc/ref.html")?
>> Have you searched the mailing list archive (www.open.com.au/archives/radiator)?
>> Have you had a quick look on Google (www.google.com)?
>> Have you included a copy of your configuration file (no secrets),
>> together with a trace 4 debug showing what is happening?
>>
>> -- 
>> Radiator: the most portable, flexible and configurable RADIUS server
>> anywhere. Available on *NIX, *BSD, Windows, MacOS X.
>> Includes support for reliable RADIUS transport (RadSec),
>> and DIAMETER translation agent.
>> -
>> Nets: internetwork inventory and management - graphical, extensible,
>> flexible with hardware, software, platform and database independence.
>> -
>> CATool: Private Certificate Authority for Unix and Unix-like systems.
>>
>>
>>
>> _______________________________________________
>> radiator mailing list
>> radiator at open.com.au
>> http://www.open.com.au/mailman/listinfo/radiator
>>      
>
>
> NB:
>
> Have you read the reference manual ("doc/ref.html")?
> Have you searched the mailing list archive (www.open.com.au/archives/radiator)?
> Have you had a quick look on Google (www.google.com)?
> Have you included a copy of your configuration file (no secrets),
> together with a trace 4 debug showing what is happening?
>
>    



More information about the radiator mailing list