[RADIATOR] Increase tacacs performance

Schnurrenberger Tobias (ID) tobias.schnurrenberger at id.ethz.ch
Wed Mar 1 10:34:54 UTC 2023


Hi there

We are facing a problem with tacacs performance. In our network there is an external service that connects to all network devices every 5 minutes, which causes a huge load of tacacs requests. It peaks at approx. 4000 new TCP sessions per second (measured by conntrack -E -e NEW | pv -l -i 1 -r > /dev/null3) on the tacacs server. This external service should not be changed.

In the radiator tacacs log there are a lot of these error messages:
ERR: ServerTACACSPLUS Stream sysread for 10.1.1.16 (10.1.1.16 port 60088) failed: . Peer probably disconnected.
(Although this client connected with IPv4, most of them are configured to use IPv6 only.)

And on the client side (network devices) we see messages, that they didn't get an answer from the tacacs servers, e.g.:
%TACACS-3-TACACS_ERROR_MESSAGE: All servers failed to respond

With tcpdump we verified that the packets from the network devices are being received but not answered. Then we observed that the following netstat counters are rising by approx 1000/s during these 5mins-runs:
    391369308 times the listen queue of a socket overflowed
    391369308 SYNs to LISTEN sockets dropped

Therefore we raised the following sysctl parameters, but the tacacs behavior didn't change at all:
net.core.somaxconn = 1000000
net.ipv4.tcp_max_syn_backlog = 100000
net.core.netdev_max_backlog = 8000000
net.netfilter.nf_conntrack_max = 262144
net.nf_conntrack_max = 262144
fs.file-max = 3661655
net.netfilter.nf_conntrack_tcp_be_liberal = 1
net.core.rmem_max = 8388608
net.core.rmem_default = 8388608
net.core.wmem_max = 8388608
net.core.wmem_default = 8388608

The same size as the socket buffers (8388608) is also configured in the radiator tacacs config option SocketQueueLength (see below).

We tried to configure the tacacs service as a farm with 16 children, hoping the load would be balanced. However, all we see is that the "frontend" process is using ~100% of one CPU and the farm children are staying relatively calm (0-20% CPU usage).

With all these things tried it seems that the ServerTACACSPLUS is just not fast enough. Is there any other option to increase the performance of our tacacs service? 

The two servers are running virtualized with these specs:
8 x Xeon 6130 @2.1GHz
32 GB Memory
RHEL 8.7
Radiator 4.26-1
perl 5.36.0

Here's the config:

FRONTEND:
# Include secrets
include /my/radiator/tacacs.inc

# Globals
DefineGlobalVar ClientRateLimitInterval 60
DefineGlobalVar ClientRateLimitCount 10
DefineGlobalVar ClientRateLimitBlocktime 600

Trace 3

User radius
Group radius

LogDir /var/log/radius
LogFile %L/rad_auth-tacacs-frontend.log
PidFile %L/rad_auth-tacacs-frontend.pid

DbDir /etc/radiator
DictionaryFile %D/dictionary,%D/dictionary.local,%D/dictionary.asa

AuthPort
AcctPort

DisabledRuntimeChecks CVE-2014-0160

StartupHook file:"%D/hooks/startup-tacacs.hook"

LogMicroseconds

SocketQueueLength 8388608

# Clients
<Client DEFAULT>
Identifier default-client
TACACSPLUSKey %{GlobalVar:ClientKey}
</Client>

<ServerTACACSPLUS>
AuthorizationTimeout 86400
Key %{GlobalVar:FailbackKey}
Port 49
AddToRequest NAS-Identifier=TACACS
GroupMemberAttr X-MY-TACACSGROUP
AllowAuthorizeOnly 1

AuthorizeGroup read-write permit service=shell cmd= cisco-av-pair\* shell:roles\* {priv-lvl=15}
AuthorizeGroup read-write permit service=shell cmd\* {priv-lvl=15}
AuthorizeGroup read-write permit .*

AuthorizeGroup read-only permit service=shell cmd\* {priv-lvl=1}
AuthorizeGroup read-only permit .*
</ServerTACACSPLUS>

# StatsLog
<StatsLog FILE>
Filename %L/auth-tacacs.stats
</StatsLog>

# AuthBy
<AuthBy HASHBALANCE>
Identifier FarmChilds
FailureBackoffTime 60
Secret xxx
MaxFailedRequests 5
Retries 0
StripFromRequest X-MY-DB-ONLINE
LocalAddress ::1

<Host ::1>
AuthPort 22601
AcctPort 22601 
</Host>
<Host ::1>
AuthPort 22602
AcctPort 22602
</Host>
<Host ::1>
AuthPort 22603
AcctPort 22603
</Host>
<Host ::1>
AuthPort 22604
AcctPort 22604
</Host>
<Host ::1>
AuthPort 22605
AcctPort 22605
</Host>
<Host ::1>
AuthPort 22606
AcctPort 22606
</Host>
<Host ::1>
AuthPort 22607
AcctPort 22607
</Host>
<Host ::1>
AuthPort 22608
AcctPort 22608
</Host>
<Host ::1>
AuthPort 22609
AcctPort 22609 
</Host>
<Host ::1>
AuthPort 22610
AcctPort 22610
</Host>
<Host ::1>
AuthPort 22611
AcctPort 22611
</Host>
<Host ::1>
AuthPort 22612
AcctPort 22612
</Host>
<Host ::1>
AuthPort 22613
AcctPort 22613
</Host>
<Host ::1>
AuthPort 22614
AcctPort 22614
</Host>
<Host ::1>
AuthPort 22615
AcctPort 22615
</Host>
<Host ::1>
AuthPort 22616
AcctPort 22616
</Host>
HashAttributes %{X-MY-NAS-IP}:%{Calling-Station-Id}:%n
</AuthBy>

<AuthBy INTERNAL>
Identifier InternalAddMyInfo
NoEAP
AuthHook file:"%D/hooks/auth-farmchild-tacacs.hook"
AcctHook file:"%D/hooks/auth-farmchild-tacacs.hook"
DefaultResult ACCEPT
</AuthBy>

<AuthBy INTERNAL>
Identifier InternalReply
AuthHook file:"%D/hooks/auth-reply-tacacs.hook"
</AuthBy>

# Handlers
<Handler X-MY-BLOCK=1>
AuthBy InternalBlock
</Handler>

<Handler>
AuthByPolicy ContinueWhileAccept
AuthBy InternalAddMyInfo
AuthBy FarmChilds
RejectHasReason
PostAuthHook file:"%D/hooks/postauth-noratelimit.hook"
StripFromReply X-MY-DB-ONLINE
</Handler>

# end


BACKEND:
# Include secrets
include /my/radiator/tacacs.inc

# Globals
DefineFormattedGlobalVar AuthPort 22600
DefineFormattedGlobalVar AcctPort 22650

DefineGlobalVar ClientRateLimitInterval 60
DefineGlobalVar ClientRateLimitCount 10
DefineGlobalVar ClientRateLimitBlocktime 600

Trace 3

User radius
Group radius

LogDir /var/log/radius
LogFile %L/rad_auth-tacacs-backend-%O.log

PidFile %L/rad_auth-tacacs-backend.pid

DbDir /etc/radiator
DictionaryFile %D/dictionary,%D/dictionary.local,%D/dictionary.asa

BindAddress ::1
BindV6Only true

FarmSize 16
FarmChildHook file:"%D/hooks/farmchild.hook"

AuthPort %{GlobalVar:AuthPort}
AcctPort

DisabledRuntimeChecks CVE-2014-0160

StartupHook file:"%D/hooks/startup.hook"

# use Time::Hires
LogMicroseconds


# Clients
<Client ::1>
Secret xxx
ClientHook file:"%D/hooks/client.hook"
DupInterval 0
</Client>

<StatsLog FILE>
Filename %L/auth-tacacs-backend-%O.stats
</StatsLog>

<AuthLog FILE>
Identifier authlog-tacacs
Filename %L/auth-tacacs.log
LogSuccess true
LogFailure true
LogFormatHook file:"%D/hooks/authlogfileformat-tacacs.hook"
</AuthLog>

# AuthBy
<AuthBy SQL>
Identifier authenticateSQL

DBSource %{GlobalVar:DB-Source}
DBUsername %{GlobalVar:DB-Username}
DBAuth %{GlobalVar:DB-Auth}
Timeout 1
SQLRetries 3
FailureBackoffTime 180

AuthSelect SELECT password FROM tacacs_user WHERE username in ?
AuthSelectParam %0
AuthColumnDef 0, User-Password, check

# NOTE: RcryptKey replaced in StartupHoook
RcryptKey %{GlobalVar:RcryptKey}
NoDefault
</AuthBy>

<AuthBy SQL>
Identifier authorizeSQL

DBSource %{GlobalVar:DB-Source}
DBUsername %{GlobalVar:DB-Username}
DBAuth %{GlobalVar:DB-Auth}
Timeout 1
SQLRetries 3
FailureBackoffTime 180

AuthSelect SELECT * FROM tacacs_authorize(?,?)
AuthSelectParam %0
AuthSelectParam %{X-MY-NAS-IP}

AuthColumnDef 0, X-MY-SUCCESS, reply
AuthColumnDef 1, X-MY-TACACSGROUP, reply
AuthColumnDef 2, X-MY-MESSAGE, reply
AuthColumnDef 3, X-MY-RUNTIME-MS, reply

AddToReply Reply-Message=%{Reply:X-MY-MESSAGE}

# authorize only
NoDefault
NoCheckPassword
</AuthBy>

<AuthBy INTERNAL>
Identifier AlwaysAccept
DefaultResult ACCEPT
</AuthBy>

<AuthBy INTERNAL>
Identifier InternalReply
AuthHook file:"%D/hooks/auth-reply-tacacs.hook"
</AuthBy>

# Handlers
<Handler Request-Type=Accounting-Request>
Identifier TacacsAcct
AuthBy AlwaysAccept
AcctLogFileName %L/acct-tacacs.log
AcctLogFileFormatHook file:"%D/hooks/acctlogformat-tacacs.hook"
</Handler>

<Handler>
Identifier SQLtacacs
AuthByPolicy ContinueWhileAccept
AuthBy authenticateSQL
AuthBy authorizeSQL
AuthBy InternalReply
RejectHasReason
AuthLog authlog-tacacs
</Handler>

# end



Best regards,
Tobias

-------------------------------------------------------
ETH Zürich
Tobias Schnurrenberger
ID INFRA Network Applications
Binzmühlestrasse 130
8092 Zürich

tobias.schnurrenberger at id.ethz.ch
-------------------------------------------------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4222 bytes
Desc: not available
URL: <https://lists.open.com.au/pipermail/radiator/attachments/20230301/3c718f0b/attachment.p7s>


More information about the radiator mailing list