(RADIATOR) Multiple radius instances problem (possible remote consulting and professional services)

Hugh Irvine hugh at open.com.au
Fri Apr 27 19:00:21 CDT 2007


Hello Sergio -

I have designed systems like this in the past with local copies of  
the LDAP database and the MySQL database on each host.

Replication from a central master was used for LDAP and the MySQL  
user records, and the MySQL RADONLINE table was maintained  
independently on each machine.

As I have mentioned previously we are available on a contract basis  
for design consulting.

regards

Hugh



On 27 Apr 2007, at 09:02, Sergio Gonzalez wrote:

> Hello Hugh
>
> My comments between lines.
>
> Thanks a lot for the help.
>
> At 04:46 p.m. 26/04/2007, Hugh Irvine wrote:
>
>> Hello Sergio -
>>
>> Thanks for the additional information.
>>
>> It is not clear to me why the dialup instances take so much longer
>> than the DSL instances to do the authentication. You also don't show
>> how long the accounting is taking.
>
> SG: The Dial-up instances take longer because the process involve  
> both mysql queries and LDAP bindings in a pair of perl hooks. The  
> process for dial-up authentication goes in general like this:
>
> - The RASes send the access-request
> - There is a hook for the handler attending this request that makes  
> a query to the RADONLINE table in MySQL Server asking for how many  
> users of the same type of this request are in the RAS. There are  
> two types of users: Regular and By-demand. Unfortunately, the same  
> PBX on each RAS answer for both types. The hook verifies the number  
> of ports for each service, and if the number of users of the type  
> of the request is exceeded, Radiator sends an Access-Reject.
> - If the number of users for the type of the request is no reached,  
> then the hook sees if the user is regular or by-demand. If by- 
> demand, radiator answer with an access-accept.
> - if the request is a regular user, the hook looks into mysql again  
> trying to find if the user is a per-hours users, or an unlimited user.
> - if the user is an unlimited one, the hook then try to find in the  
> LDAP server which branch inside of it matches the username/password  
> pair (this does an LDAP search in the whole LDAP server and a  
> binding for each branch that matches the username). If it does, it  
> sends the access-accept, if not, an access-reject
> - if the user is a per-hour one, the hook makes a query into the  
> Accounting table to see if the user has enough timeleft to be  
> connected. If he does, radiator sends an access-accept, if not, an  
> access-reject
>
>
>> In any case, if the problem is slow LDAP and SQL databases you should
>> address those issues first.
>
>
> SG: Unfortunately, the method to know if the user can be accepted  
> or not cannot be changed. Worst, there is also another type of dial- 
> up users, in other handler that invokes a AuthBy Proxy clause.
>
>
>> I am guessing that there is some event like a DSL RAS rebooting that
>> is causing a burst of authentication requests that swamp the
>> authentication server(s).
>
> SG: Unfortunately this is not the case. The DSL RASes just have  
> many users at peak hours. The services is national wide. We are  
> talking about 60.000 users today and 20.000 concurrent connections.  
> The hardware and software I described in my last email was  
> dimensioned for 250.000 users and around 90.000 concurrent  
> connections.
>
>
>> How many RADIUS requests per second are hitting the boxes?
>
> SG: In total (DSL,Dial-up), there most be around 60 req/sec in peak  
> hours. Those req/sec had been splitted into the instances I  
> mentioned in my last email.
>
>
>> BTW - the numbers you show for the SUN LDAP server are consistent
>> with what I have observed at other sites - it doesn't seem to be able
>> to process more than at the most 10 requests per second. This being
>> the case, whenever you have more than 10 requests per second arriving
>> in Radiator you will have a problem.
>
> SG: It would be advisable to have an auth an a acct instances for,  
> lets say, every two or three dial-up RASes?. Also, how can be  
> handled the high rate of request in the DSL RASes?. The main goal  
> is to optimize the configuration for those three sun servers.
>
>
>
>> There may also be a problem with inserting the accounting data into
>> the MySQL database, but you have not provided any information on  
>> that.
>
>
> SG: As we spoke in Oct last year, the Accounting records goes  
> around 8 million per month. The insertion takes the same amount of  
> time as in Oct. around 2 hundreds of a second, and the hit rate is  
> max 8 req/sec.
>
>
>> regards
>>
>> Hugh
>>
>>
>>
>> On 27 Apr 2007, at 07:28, Sergio Gonzalez wrote:
>>
>>> Hello,
>>>
>>> A customer have the next configuration to authenticate more than
>>> 20.000 concurrent users every day:
>>>
>>> - Sun A: v240Z with 2 Gb RAM, Solaris 9 64bit, Radiator 3.14, Perl
>>> 5.8.7
>>> - Sun B: v240Z with 2 Gb RAM, Solaris 9 64bit, Radiator 3.14, Perl
>>> 5.8.7, MySQL Professional 5.0.17c
>>> - Sun C: v880 with 4 Gb RAM, Solaris 9 64bit, Radiator 3.14, Perl
>>> 5.8.7, MySQL Professional 5.0.17c
>>> - Sun D: v440 with 16 Gb Ram, Solaris 9 64bit, Sun LDAP Server 5
>>> - Sun E: v440 with 16 Gb Ram, Solaris 9 64bit, Sun LDAP Server 5
>>>
>>> Those radius servers answer requests from:
>>>
>>> - Around 35 Dial-up RASes with morre than 150 ports each
>>> - 4 DSL RASes with more than 7.000 ports each
>>>
>>> Each radius server has authentication and accounting instances. The
>>> Authentication instances ask the LDAP server (in fact only one, but
>>> if the first fails, it will ask the other) and also the MySQL
>>> servers (in the same fashion as the LDAP, the first, if fails, the
>>> second).
>>>
>>> Taking a Trace -1 and a LogMicroseconds from those instances I got:
>>>
>>> Dial-up instances: 8 req/sec max. Each authentication request takes
>>> 0.15 sec to complete. This means around 7 req/sec before going into
>>> the udp queue.
>>> DSL instances: 25 req/sec max. Each authentication request takes
>>> sec to complete. This means
>>>
>>> The auth and acct requests were attended between all three servers
>>> like this:
>>>
>>> Sun A: 2 auth instances for Dial-up and 2 acct instances for Dial- 
>>> up.
>>> Sun B: 2 auth instances for DSL and 2 acct instances for DSL.
>>> Sun C: 2 auth instances for DSL and 2 acct instances for DSL.
>>>
>>> Since 5 days ago, the two dia-up auth instances in Sun A got
>>> stalled. No even radpwtst worked, but looking into the logfile, the
>>> process seems to be up and running ( a lot of registries got
>>> written every second, I mean, a lot of Access-Accept and Access-  
>>> Reject, so the whole process is working find from radiator's point
>>> of view). For the time the  Sun A instances got stalled, a few
>>> seconds later. the Auth instances for Sun C got stalled also. The
>>> only way to recover the disaster was to implement a config file for
>>> those instances with a "bypass", just telling to any request to be
>>> accepted.
>>>
>>>
>>> In the three Radiator Sun servers the udp_recv_hiwat parameter is
>>> set to more than 8 million and the udp buffer is set to the max,
>>> 64k (solaris boundary). Also, when the instances got stalled, there
>>> are a lot of Access-Accept that never leaves the boxes, and also
>>> there are a lot of access-request comming from the RASes that never
>>> reaches the Radiator application. It seems to be a socket buffer
>>> overflow problem.
>>>
>>> How do I fix this?.
>>>
>>>
>>> Best Regards.
>>>
>>> Sergio Gonzalez
>>>
>>>
>>>
>>
>>
>>
>> NB:
>>
>> Have you read the reference manual ("doc/ref.html")?
>> Have you searched the mailing list archive (www.open.com.au/ 
>> archives/ radiator)?
>> Have you had a quick look on Google (www.google.com)?
>> Have you included a copy of your configuration file (no secrets),
>> together with a trace 4 debug showing what is happening?
>> Have you checked the RadiusExpert wiki:
>> http://www.open.com.au/wiki/index.php/Main_Page
>>
>> --
>> Radiator: the most portable, flexible and configurable RADIUS server
>> anywhere. Available on *NIX, *BSD, Windows, MacOS X.
>> Includes support for reliable RADIUS transport (RadSec),
>> and DIAMETER translation agent.
>> -
>> Nets: internetwork inventory and management - graphical, extensible,
>> flexible with hardware, software, platform and database independence.
>> -
>> CATool: Private Certificate Authority for Unix and Unix-like systems.
>>



NB:

Have you read the reference manual ("doc/ref.html")?
Have you searched the mailing list archive (www.open.com.au/archives/ 
radiator)?
Have you had a quick look on Google (www.google.com)?
Have you included a copy of your configuration file (no secrets),
together with a trace 4 debug showing what is happening?
Have you checked the RadiusExpert wiki:
http://www.open.com.au/wiki/index.php/Main_Page

-- 
Radiator: the most portable, flexible and configurable RADIUS server
anywhere. Available on *NIX, *BSD, Windows, MacOS X.
Includes support for reliable RADIUS transport (RadSec),
and DIAMETER translation agent.
-
Nets: internetwork inventory and management - graphical, extensible,
flexible with hardware, software, platform and database independence.
-
CATool: Private Certificate Authority for Unix and Unix-like systems.


--
Archive at http://www.open.com.au/archives/radiator/
Announcements on radiator-announce at open.com.au
To unsubscribe, email 'majordomo at open.com.au' with
'unsubscribe radiator' in the body of the message.


More information about the radiator mailing list