(RADIATOR) Suggestions for high volume system

Sun Apr 28 08:06:52 CDT 2002

Hello,

I am wondering what's the best design for a high volume radius system. We are
looking at on the order of 100-150 requests/second (auth+acct) on average.
Does anyone here have a load balancing system setup? If so, I'd appreciate any
tips on how you set this up.

After using Radiator for quite awhile, I've found that the main things that
cause slowdowns is database queries or network outages. I've noticed during
network outages, some RASes (we have mostly Ascend) and proxy servers start
flooding the server once the connectivity comes back. These appear to be
queued requests (mostly accounting) on the systems. In this situation it
pretty much kills our radius server (CPU -> 99%) and many times we have to run
Radiator in a very basic configuration (no database, no authentication) for
some time to cool things down. Many times I've even had to go to our firewall
and block some RAS traffic.

So I am just looking for some tips on how to setup a scalable system. We have
a test system setup with a Foundry switch load balancing to 2 Radiator servers
via roundrobin. However, in our tests we are noticing that the load balancing
is not even when the source UDP port stays constant, which is for example when
another Radiator is forwarding requests to it. It only seems to load balance
properly when the source ports change. Anyone have any ideas what could be
wrong here?

What I was thinking was to instead setup one Radiator system that uses the
AuthBy loadbalance clause instead of the Foundry switch. Any thoughts on this
instead of hardware load balancing?

The next issue is database slowdowns. I am thinking that the best setup would
be for the RASes to go directly to Radiators that do not have any sort of DB
dependency, and instead they proxy to respective servers that do have DB
dependencies. For example:

       A
      / \
     /   \
    B     C
   / \   / \
  D   E F   G

A = Radiator doing AuthBy loadbalance to B and C (or hardware switch)
B/C = Radiator with only AuthBy RADIUS clauses
D/E/F/G = Radiator with DB access

The B and C trees would be identical. Does this sound like a proper setup?  

As far as the type of database access, we've mostly seen that accounting is
what causes problems. I believe this is due to our table designs. For example,
we have unique indexes to drop duplicate accounting, indexed on many fields.
At some point when there is alot of data inserts become slow. I was thinking
that Radiator's access to the DB should be made as fast as possible, and that
Radiator should instead use the DB as sort of a log table for accounting (with
no indexes at all), similar to writing to raw files. Then, periodically, an
external process would process this data and move to the real accounting
tables (with indexes, etc). This way, DB query time is kept to a minimal for
accounting.

Another problem we have is the number of Handlers. We handle requests
depending on the following:

RAS IP
RAS IP+DNIS
RAS IP+DNIS+Realm

With all of our devices, the number of handlers is getting quite large. I'm
wondering what would be an upper bound on this and if there is a better way to
handle this. We have almost 500 handlers at this point.

Anyhow, I'd appreciate any info or tips anyone has on a large setup like this.

Thanks,

Viraj.

===
Archive at http://www.open.com.au/archives/radiator/
Announcements on radiator-announce at open.com.au
To unsubscribe, email 'majordomo at open.com.au' with
'unsubscribe radiator' in the body of the message.