No subject


Tue Jun 24 01:27:03 CDT 2008


that kill radiator performances... Or worse, ldap getting stuck... and =
everything waits.
We also had problems with a a broken mysql accounting DB that blocked =
all requests.
=20
Radiator being monothreaded, backend waits are your worst ennemy. I =
advise you to embed a copy of your DB/LDAP on every server to avoid =
network latencies... and make good use of indexes.
=20
Here, we have bi-Xeon/Linux servers, each one is running 2 or 4 =
authentication instances and one openldap containing roughly 2 millions =
entries. Works fine. If things heat up, each instance manages to treat a =
few dizains requets /sec (50 should be ok)... If things really heat up, =
we lose packet and get retries from modems but radiator never gets stuck =
(*), it just does its best until the load gets down.
=20
Regards
=20
Laurent
=20
(*) actually we ran into trouble once with radiator acting as a proxy =
under huge load, we missed both requests from BASes and answers from =
remote servers, so radiator thought remote radius did not answer, and =
retried, and etc... and we could not handle the load...

________________________________

De: owner-radiator at open.com.au de la part de Hugh Irvine
Date: jeu. 26/04/2007 23:46
=C0: Sergio Gonzalez
Cc: radiator at open.com.au; info at open.com.au
Objet : Re: (RADIATOR) Multiple radius instances problem (possible =
remote consulting and professional services)




Hello Sergio -

Thanks for the additional information.

It is not clear to me why the dialup instances take so much longer=20
than the DSL instances to do the authentication. You also don't show=20
how long the accounting is taking.

In any case, if the problem is slow LDAP and SQL databases you should=20
address those issues first.

I am guessing that there is some event like a DSL RAS rebooting that=20
is causing a burst of authentication requests that swamp the=20
authentication server(s).

How many RADIUS requests per second are hitting the boxes?

BTW - the numbers you show for the SUN LDAP server are consistent=20
with what I have observed at other sites - it doesn't seem to be able=20
to process more than at the most 10 requests per second. This being=20
the case, whenever you have more than 10 requests per second arriving=20
in Radiator you will have a problem.

There may also be a problem with inserting the accounting data into=20
the MySQL database, but you have not provided any information on that.

regards

Hugh



On 27 Apr 2007, at 07:28, Sergio Gonzalez wrote:

> Hello,
>
> A customer have the next configuration to authenticate more than=20
> 20.000 concurrent users every day:
>
> - Sun A: v240Z with 2 Gb RAM, Solaris 9 64bit, Radiator 3.14, Perl=20
> 5.8.7
> - Sun B: v240Z with 2 Gb RAM, Solaris 9 64bit, Radiator 3.14, Perl=20
> 5.8.7, MySQL Professional 5.0.17c
> - Sun C: v880 with 4 Gb RAM, Solaris 9 64bit, Radiator 3.14, Perl=20
> 5.8.7, MySQL Professional 5.0.17c
> - Sun D: v440 with 16 Gb Ram, Solaris 9 64bit, Sun LDAP Server 5
> - Sun E: v440 with 16 Gb Ram, Solaris 9 64bit, Sun LDAP Server 5
>
> Those radius servers answer requests from:
>
> - Around 35 Dial-up RASes with morre than 150 ports each
> - 4 DSL RASes with more than 7.000 ports each
>
> Each radius server has authentication and accounting instances. The=20
> Authentication instances ask the LDAP server (in fact only one, but=20
> if the first fails, it will ask the other) and also the MySQL=20
> servers (in the same fashion as the LDAP, the first, if fails, the=20
> second).
>
> Taking a Trace -1 and a LogMicroseconds from those instances I got:
>
> Dial-up instances: 8 req/sec max. Each authentication request takes=20
> 0.15 sec to complete. This means around 7 req/sec before going into=20
> the udp queue.
> DSL instances: 25 req/sec max. Each authentication request takes =20
> sec to complete. This means
>
> The auth and acct requests were attended between all three servers=20
> like this:
>
> Sun A: 2 auth instances for Dial-up and 2 acct instances for Dial-up.
> Sun B: 2 auth instances for DSL and 2 acct instances for DSL.
> Sun C: 2 auth instances for DSL and 2 acct instances for DSL.
>
> Since 5 days ago, the two dia-up auth instances in Sun A got=20
> stalled. No even radpwtst worked, but looking into the logfile, the=20
> process seems to be up and running ( a lot of registries got=20
> written every second, I mean, a lot of Access-Accept and Access-
> Reject, so the whole process is working find from radiator's point=20
> of view). For the time the  Sun A instances got stalled, a few=20
> seconds later. the Auth instances for Sun C got stalled also. The=20
> only way to recover the disaster was to implement a config file for=20
> those instances with a "bypass", just telling to any request to be=20
> accepted.
>
>
> In the three Radiator Sun servers the udp_recv_hiwat parameter is=20
> set to more than 8 million and the udp buffer is set to the max,=20
> 64k (solaris boundary). Also, when the instances got stalled, there=20
> are a lot of Access-Accept that never leaves the boxes, and also=20
> there are a lot of access-request comming from the RASes that never=20
> reaches the Radiator application. It seems to be a socket buffer=20
> overflow problem.
>
> How do I fix this?.
>
>
> Best Regards.
>
> Sergio Gonzalez
>
>
>
>



NB:

Have you read the reference manual ("doc/ref.html")?
Have you searched the mailing list archive (www.open.com.au/archives/
radiator)?
Have you had a quick look on Google (www.google.com)?
Have you included a copy of your configuration file (no secrets),
together with a trace 4 debug showing what is happening?
Have you checked the RadiusExpert wiki:
http://www.open.com.au/wiki/index.php/Main_Page

--
Radiator: the most portable, flexible and configurable RADIUS server
anywhere. Available on *NIX, *BSD, Windows, MacOS X.
Includes support for reliable RADIUS transport (RadSec),
and DIAMETER translation agent.
-
Nets: internetwork inventory and management - graphical, extensible,
flexible with hardware, software, platform and database independence.
-
CATool: Private Certificate Authority for Unix and Unix-like systems.


--
Archive at http://www.open.com.au/archives/radiator/
Announcements on radiator-announce at open.com.au
To unsubscribe, email 'majordomo at open.com.au' with
'unsubscribe radiator' in the body of the message.



------_=_NextPart_001_01C7885F.A4A86378
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<HTML dir=3Dltr><HEAD><TITLE>Re: (RADIATOR) Multiple radius instances =
problem (possible remote consulting and professional services)</TITLE>=0A=
<META http-equiv=3DContent-Type content=3D"text/html; charset=3Dunicode">=0A=
<META content=3D"MSHTML 6.00.2800.1561" name=3DGENERATOR></HEAD>=0A=
<BODY>=0A=
<DIV id=3DidOWAReplyText7575 dir=3Dltr>=0A=
<DIV dir=3Dltr><FONT face=3DArial color=3D#000000 size=3D2>=0A=
<DIV id=3DidOWAReplyText13240 dir=3Dltr>=0A=
<DIV dir=3Dltr><FONT face=3DArial color=3D#000000 size=3D2>From my own =
experience with radiator, it looks like slow ldap requests that kill =
radiator performances... Or worse, ldap getting stuck... and everything =
waits.</FONT></DIV>=0A=
<DIV dir=3Dltr><FONT face=3DArial size=3D2>We also had problems with a a =
broken mysql accounting DB that blocked all requests.</FONT></DIV>=0A=
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>=0A=
<DIV dir=3Dltr><FONT face=3DArial size=3D2>Radiator being =
monothreaded,&nbsp;backend waits are your worst ennemy.&nbsp;I advise =
you to embed&nbsp;a copy of your&nbsp;DB/LDAP on every server to avoid =
network latencies... and make good use of indexes.</FONT></DIV>=0A=
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>=0A=
<DIV dir=3Dltr><FONT face=3DArial size=3D2>Here, we have bi-Xeon/Linux =
servers, each&nbsp;one is running 2 or 4 authentication instances and =
one openldap containing roughly 2 millions entries. Works fine. If =
things heat up, each instance manages to treat a few dizains requets =
/sec (50 should be ok)... If things really heat up, we lose packet and =
get retries from modems but radiator never gets stuck (*), it just does =
its best until the load gets down.</FONT></DIV></DIV>=0A=
<DIV dir=3Dltr>&nbsp;</DIV>=0A=
<DIV dir=3Dltr><FONT face=3DArial size=3D2>Regards</FONT></DIV>=0A=
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>=0A=
<DIV dir=3Dltr><FONT face=3DArial size=3D2>Laurent</FONT></DIV>=0A=
<DIV dir=3Dltr><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>=0A=
<DIV dir=3Dltr><FONT face=3DArial size=3D2>(*) actually we ran into =
trouble once with radiator acting as a proxy under huge load, =
we&nbsp;missed both requests from BASes and&nbsp;answers from remote =
servers, so radiator thought remote radius did not answer, and retried, =
and etc... and we could not handle the =
load...</FONT></DIV></FONT></DIV></DIV>=0A=
<DIV dir=3Dltr><BR>=0A=
<HR tabIndex=3D-1>=0A=
<FONT face=3DTahoma size=3D2><B>De:</B> owner-radiator at open.com.au de la =
part de Hugh Irvine<BR><B>Date:</B> jeu. 26/04/2007 23:46<BR><B>=C0:</B> =
Sergio Gonzalez<BR><B>Cc:</B> radiator at open.com.au; =
info at open.com.au<BR><B>Objet :</B> Re: (RADIATOR) Multiple radius =
instances problem (possible remote consulting and professional =
services)<BR></FONT><BR></DIV>=0A=
<DIV><BR>=0A=
<P><TT><FONT size=3D2>Hello Sergio -<BR><BR>Thanks for the additional =
information.<BR><BR>It is not clear to me why the dialup instances take =
so much longer&nbsp;<BR>than the DSL instances to do the authentication. =
You also don't show&nbsp;<BR>how long the accounting is =
taking.<BR><BR>In any case, if the problem is slow LDAP and SQL =
databases you should&nbsp;<BR>address those issues first.<BR><BR>I am =
guessing that there is some event like a DSL RAS rebooting =
that&nbsp;<BR>is causing a burst of authentication requests that swamp =
the&nbsp;<BR>authentication server(s).<BR><BR>How many RADIUS requests =
per second are hitting the boxes?<BR><BR>BTW - the numbers you show for =
the SUN LDAP server are consistent&nbsp;<BR>with what I have observed at =
other sites - it doesn't seem to be able&nbsp;<BR>to process more than =
at the most 10 requests per second. This being&nbsp;<BR>the case, =
whenever you have more than 10 requests per second arriving&nbsp;<BR>in =
Radiator you will have a problem.<BR><BR>There may also be a problem =
with inserting the accounting data into&nbsp;<BR>the MySQL database, but =
you have not provided any information on =
that.<BR><BR>regards<BR><BR>Hugh<BR><BR><BR><BR>On 27 Apr 2007, at =
07:28, Sergio Gonzalez wrote:<BR><BR>&gt; Hello,<BR>&gt;<BR>&gt; A =
customer have the next configuration to authenticate more =
than&nbsp;<BR>&gt; 20.000 concurrent users every day:<BR>&gt;<BR>&gt; - =
Sun A: v240Z with 2 Gb RAM, Solaris 9 64bit, Radiator 3.14, =
Perl&nbsp;<BR>&gt; 5.8.7<BR>&gt; - Sun B: v240Z with 2 Gb RAM, Solaris 9 =
64bit, Radiator 3.14, Perl&nbsp;<BR>&gt; 5.8.7, MySQL Professional =
5.0.17c<BR>&gt; - Sun C: v880 with 4 Gb RAM, Solaris 9 64bit, Radiator =
3.14, Perl&nbsp;<BR>&gt; 5.8.7, MySQL Professional 5.0.17c<BR>&gt; - Sun =
D: v440 with 16 Gb Ram, Solaris 9 64bit, Sun LDAP Server 5<BR>&gt; - Sun =
E: v440 with 16 Gb Ram, Solaris 9 64bit, Sun LDAP Server =
5<BR>&gt;<BR>&gt; Those radius servers answer requests =
from:<BR>&gt;<BR>&gt; - Around 35 Dial-up RASes with morre than 150 =
ports each<BR>&gt; - 4 DSL RASes with more than 7.000 ports =
each<BR>&gt;<BR>&gt; Each radius server has authentication and =
accounting instances. The&nbsp;<BR>&gt; Authentication instances ask the =
LDAP server (in fact only one, but&nbsp;<BR>&gt; if the first fails, it =
will ask the other) and also the MySQL&nbsp;<BR>&gt; servers (in the =
same fashion as the LDAP, the first, if fails, the&nbsp;<BR>&gt; =
second).<BR>&gt;<BR>&gt; Taking a Trace -1 and a LogMicroseconds from =
those instances I got:<BR>&gt;<BR>&gt; Dial-up instances: 8 req/sec max. =
Each authentication request takes&nbsp;<BR>&gt; 0.15 sec to complete. =
This means around 7 req/sec before going into&nbsp;<BR>&gt; the udp =
queue.<BR>&gt; DSL instances: 25 req/sec max. Each authentication =
request takes&nbsp;&nbsp;<BR>&gt; sec to complete. This =
means<BR>&gt;<BR>&gt; The auth and acct requests were attended between =
all three servers&nbsp;<BR>&gt; like this:<BR>&gt;<BR>&gt; Sun A: 2 auth =
instances for Dial-up and 2 acct instances for Dial-up.<BR>&gt; Sun B: 2 =
auth instances for DSL and 2 acct instances for DSL.<BR>&gt; Sun C: 2 =
auth instances for DSL and 2 acct instances for DSL.<BR>&gt;<BR>&gt; =
Since 5 days ago, the two dia-up auth instances in Sun A =
got&nbsp;<BR>&gt; stalled. No even radpwtst worked, but looking into the =
logfile, the&nbsp;<BR>&gt; process seems to be up and running ( a lot of =
registries got&nbsp;<BR>&gt; written every second, I mean, a lot of =
Access-Accept and Access-<BR>&gt; Reject, so the whole process is =
working find from radiator's point&nbsp;<BR>&gt; of view). For the time =
the&nbsp; Sun A instances got stalled, a few&nbsp;<BR>&gt; seconds =
later. the Auth instances for Sun C got stalled also. The&nbsp;<BR>&gt; =
only way to recover the disaster was to implement a config file =
for&nbsp;<BR>&gt; those instances with a "bypass", just telling to any =
request to be&nbsp;<BR>&gt; accepted.<BR>&gt;<BR>&gt;<BR>&gt; In the =
three Radiator Sun servers the udp_recv_hiwat parameter is&nbsp;<BR>&gt; =
set to more than 8 million and the udp buffer is set to the =
max,&nbsp;<BR>&gt; 64k (solaris boundary). Also, when the instances got =
stalled, there&nbsp;<BR>&gt; are a lot of Access-Accept that never =
leaves the boxes, and also&nbsp;<BR>&gt; there are a lot of =
access-request comming from the RASes that never&nbsp;<BR>&gt; reaches =
the Radiator application. It seems to be a socket buffer&nbsp;<BR>&gt; =
overflow problem.<BR>&gt;<BR>&gt; How do I fix =
this?.<BR>&gt;<BR>&gt;<BR>&gt; Best Regards.<BR>&gt;<BR>&gt; Sergio =
Gonzalez<BR>&gt;<BR>&gt;<BR>&gt;<BR>&gt;<BR><BR><BR><BR>NB:<BR><BR>Have =
you read the reference manual ("doc/ref.html")?<BR>Have you searched the =
mailing list archive (www.open.com.au/archives/<BR>radiator)?<BR>Have =
you had a quick look on Google (www.google.com)?<BR>Have you included a =
copy of your configuration file (no secrets),<BR>together with a trace 4 =
debug showing what is happening?<BR>Have you checked the RadiusExpert =
wiki:<BR><A =
href=3D"http://www.open.com.au/wiki/index.php/Main_Page">http://www.open.=
com.au/wiki/index.php/Main_Page</A><BR><BR>--<BR>Radiator: the most =
portable, flexible and configurable RADIUS server<BR>anywhere. Available =
on *NIX, *BSD, Windows, MacOS X.<BR>Includes support for reliable RADIUS =
transport (RadSec),<BR>and DIAMETER translation agent.<BR>-<BR>Nets: =
internetwork inventory and management - graphical, =
extensible,<BR>flexible with hardware, software, platform and database =
independence.<BR>-<BR>CATool: Private Certificate Authority for Unix and =
Unix-like systems.<BR><BR><BR>--<BR>Archive at <A =
href=3D"http://www.open.com.au/archives/radiator/">http://www.open.com.au=
/archives/radiator/</A><BR>Announcements on =
radiator-announce at open.com.au<BR>To unsubscribe, email =
'majordomo at open.com.au' with<BR>'unsubscribe radiator' in the body of =
the message.<BR></FONT></TT></P></DIV></BODY></HTML>
------_=_NextPart_001_01C7885F.A4A86378--

--
Archive at http://www.open.com.au/archives/radiator/
Announcements on radiator-announce at open.com.au
To unsubscribe, email 'majordomo at open.com.au' with
'unsubscribe radiator' in the body of the message.


More information about the radiator mailing list