(RADIATOR) Multiple radius instances problem (possible remote consulting and professional services)
Sergio Gonzalez
sagonzal at sky.net.co
Thu Apr 26 16:28:01 CDT 2007
Hello,
A customer have the next configuration to authenticate more than
20.000 concurrent users every day:
- Sun A: v240Z with 2 Gb RAM, Solaris 9 64bit, Radiator 3.14, Perl 5.8.7
- Sun B: v240Z with 2 Gb RAM, Solaris 9 64bit, Radiator 3.14, Perl
5.8.7, MySQL Professional 5.0.17c
- Sun C: v880 with 4 Gb RAM, Solaris 9 64bit, Radiator 3.14, Perl
5.8.7, MySQL Professional 5.0.17c
- Sun D: v440 with 16 Gb Ram, Solaris 9 64bit, Sun LDAP Server 5
- Sun E: v440 with 16 Gb Ram, Solaris 9 64bit, Sun LDAP Server 5
Those radius servers answer requests from:
- Around 35 Dial-up RASes with morre than 150 ports each
- 4 DSL RASes with more than 7.000 ports each
Each radius server has authentication and accounting instances. The
Authentication instances ask the LDAP server (in fact only one, but
if the first fails, it will ask the other) and also the MySQL servers
(in the same fashion as the LDAP, the first, if fails, the second).
Taking a Trace -1 and a LogMicroseconds from those instances I got:
Dial-up instances: 8 req/sec max. Each authentication request takes
0.15 sec to complete. This means around 7 req/sec before going into
the udp queue.
DSL instances: 25 req/sec max. Each authentication request takes sec
to complete. This means
The auth and acct requests were attended between all three servers like this:
Sun A: 2 auth instances for Dial-up and 2 acct instances for Dial-up.
Sun B: 2 auth instances for DSL and 2 acct instances for DSL.
Sun C: 2 auth instances for DSL and 2 acct instances for DSL.
Since 5 days ago, the two dia-up auth instances in Sun A got stalled.
No even radpwtst worked, but looking into the logfile, the process
seems to be up and running ( a lot of registries got written every
second, I mean, a lot of Access-Accept and Access-Reject, so the
whole process is working find from radiator's point of view). For the
time the Sun A instances got stalled, a few seconds later. the Auth
instances for Sun C got stalled also. The only way to recover the
disaster was to implement a config file for those instances with a
"bypass", just telling to any request to be accepted.
In the three Radiator Sun servers the udp_recv_hiwat parameter is set
to more than 8 million and the udp buffer is set to the max, 64k
(solaris boundary). Also, when the instances got stalled, there are a
lot of Access-Accept that never leaves the boxes, and also there are
a lot of access-request comming from the RASes that never reaches the
Radiator application. It seems to be a socket buffer overflow problem.
How do I fix this?.
Best Regards.
Sergio Gonzalez
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.open.com.au/pipermail/radiator/attachments/20070426/bc857aba/attachment.html>
More information about the radiator
mailing list