[RADIATOR] Increase tacacs performance

Heikki Vatiainen hvn at open.com.au
Thu Mar 2 11:40:14 UTC 2023


On 1.3.2023 12.34, Schnurrenberger Tobias (ID) via radiator wrote:

> We are facing a problem with tacacs performance. In our network there is an external service that connects to all network devices every 5 minutes, which causes a huge load of tacacs requests. It peaks at approx. 4000 new TCP sessions per second (measured by conntrack -E -e NEW | pv -l -i 1 -r > /dev/null3) on the tacacs server. This external service should not be changed.

Thanks for the configuration and details about your operating 
environment. I think there are a couple of options how to update the 
frontend. The backend seems to be already well set up for handling large 
amounts of requests.

> In the radiator tacacs log there are a lot of these error messages:
> ERR: ServerTACACSPLUS Stream sysread for 10.1.1.16 (10.1.1.16 port 60088) failed: . Peer probably disconnected.
> (Although this client connected with IPv4, most of them are configured to use IPv6 only.)

The default log level for this was updated for Radiator 4.27. With 4.27 
this is now a DEBUG level message. You can reconfigure it like this 
within <ServerTACACSPLUS>

     DisconnectTraceLevel 4

The disconnects by a TACACS+ client are normal and expected. Some 
releases ago, the TCP stream handling was unified and while there are 
some protocols where disconnect by client is unexpected, this is not the 
case. In short: this not an error with TACACS+.

> And on the client side (network devices) we see messages, that they didn't get an answer from the tacacs servers, e.g.:
> %TACACS-3-TACACS_ERROR_MESSAGE: All servers failed to respond

This is likely caused by the frontend not being able to keep up with all 
the requests. As mentioned above, not related to disconnect log messages 
on Radiator side.

> We tried to configure the tacacs service as a farm with 16 children, hoping the load would be balanced. However, all we see is that the "frontend" process is using ~100% of one CPU and the farm children are staying relatively calm (0-20% CPU usage).

You could keep this for now since it's already working. Once the 
frontend is able to handle all requests, it should raise the backend 
load too.

> With all these things tried it seems that the ServerTACACSPLUS is just not fast enough. Is there any other option to increase the performance of our tacacs service?

I agree with this diagnosis. The first thing you could consider is 
utilising 'AllowAuthorizeOnly 1' you have already configured.

This options allows Radiator to lookup authorisation information for 
TACACS+ when no such information is already present. By default, 
GroupMemberAttr for authorisation information must be fetched during the 
authentication.

When AllowAuthorizeOnly is set, Radiator triggers an Access-Request that 
has 'Service-Type = Authorize-Only' but no User-Password attribute. In 
your case you could catch these requests with a specific Handler and 
then run the 'authorizeSQL' AuthBy only within this new Handler.

When you know you can handle 'Service-Type = Authorize-Only' TACACS+ 
derived access requests, you can enable FarmSize on the frontend.

When you do that, you can have parallel workers accepting and processing 
TACACS+ requests. It's likely that some related acccess and 
authorisation requests are picked by different workers, but when that 
happens, the worker can authorise the TACACS+ authorization requests 
separately.

> FRONTEND:

> <ServerTACACSPLUS>
> AuthorizationTimeout 86400
> Key %{GlobalVar:FailbackKey}
> Port 49
> AddToRequest NAS-Identifier=TACACS
> GroupMemberAttr X-MY-TACACSGROUP
> AllowAuthorizeOnly 1

Goodies tacacsplus example shows how to handle requests triggered by 
AllowAuthorizeOnly. You may want to do some initial testing with the 
goodies example to see how it behaves when FarmSize is > 1.


> BACKEND:

> # Handlers
> <Handler Request-Type=Accounting-Request>
> Identifier TacacsAcct
> AuthBy AlwaysAccept
> AcctLogFileName %L/acct-tacacs.log
> AcctLogFileFormatHook file:"%D/hooks/acctlogformat-tacacs.hook"
> </Handler>

Here's how to catch requests that are triggered when the same frontend 
worker does not process both TACACS+ authentication and subsequent 
authorisation request:

<Handler Service-Type=Authorize-Only>
    # Identifier, AuthByPolicy, etc.
    # No AuthBy authenticateSQL - there's no User-Password in the reuqest
    AuthBy authorizeSQL
</Handler>

> <Handler>
> Identifier SQLtacacs
> AuthByPolicy ContinueWhileAccept
> AuthBy authenticateSQL
> AuthBy authorizeSQL
> AuthBy InternalReply
> RejectHasReason
> AuthLog authlog-tacacs
> </Handler>
> 
> # end
Another option for frontend + FarmSize is to run some type of TCP load 
balancer and separate Radiator instances listening to different TACACS+ 
ports. HAProxy could work, but I'd first see about FarmSize on frontend 
with backend set so that it can do authorize only requests.

Please let us know if the above helps.

Thanks,
Heikki

-- 
Heikki Vatiainen <hvn at open.com.au>

Radiator: the most portable, flexible and configurable RADIUS server
anywhere. SQL, proxy, DBM, files, LDAP, TACACS+, PAM, Active Directory,
EAP, TLS, TTLS, PEAP, WiMAX, RSA, Vasco, Yubikey, HOTP, TOTP,
DIAMETER etc. Full source on Unix, Windows, MacOSX, Solaris, VMS, etc.


More information about the radiator mailing list