[RADIATOR] mysql DBI timed out connections and too many open files (sockets)
Rauno Tuul
raunz at hot.ee
Thu Mar 30 10:54:48 UTC 2017
Hi,
I've tracked down a quite unexpected issue with radiator in our live
environment and I don't know, how to solve it properly.
It's a radiator 4.11 live accounting instance with single mysql DB
configuration:
<AuthBy SQL>
DBSource dbi:mysql:dbname:x.x.x.x:3307
Timeout 2
FailureBackoffTime 10
Every time, when a DBI mysql connection times out, a new socket is
opened. These sockets, which never get a reply (e.g. conn refused) from
server, idle forever - each one consumes an "open file". Since the
server is Debian 8 with systemd, the radiator process is started with
the following systemd service configuration:
[Service]
Type=simple
Restart=on-failure
TimeoutSec=10
User=radius
ExecStart=/usr/bin/radiusd -foreground -config_file /etc/acct.cfg
As the Debian 8 systemd ignores system wide limits and starts the
process with systemd default limit of 1024
$ cat /proc/${pid_of_radiusd}/limits
Limit Soft Limit Hard Limit Units
Max open files 1024 1024
files
Basically, the radiusd will hit that limit after approx 1020 DBI reconnects:
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = -1 EMFILE (Too many open files)
As there are too many open files, also all logging fails, as the process
is unable to open a log file.
Even is the database comes online again for example 800 reconnects, life
goes on, but the previously opened 800 socket remain there until the
process exists. Problem is, that the old, timed out, sockets are never
freed. Example of a process open file list, which exhausted the resources:
$ lsof -p radiusd_pid
radiusd 6409 radius 0r CHR 1,3 0t0 6097 /dev/null
radiusd 6409 radius 1u unix 0xffff88007b997bc0 0t0 57168 socket
radiusd 6409 radius 2u unix 0xffff88007b997bc0 0t0 57168 socket
radiusd 6409 radius 3u IPv4 57184 0t0 UDP *:9141
radiusd 6409 radius 4u IPv4 57183 0t0 UDP
localhost:914
radiusd 6409 radius 5u sock 0,7 0t0 57195 can't
identify protocol
....1015 lines skipped...
radiusd 6409 radius 1020u sock 0,7 0t0 70498 can't
identify protocol
radiusd 6409 radius 1021u sock 0,7 0t0 70516 can't
identify protocol
radiusd 6409 radius 1022u sock 0,7 0t0 70525 can't
identify protocol
radiusd 6409 radius 1023u sock 0,7 0t0 70539 can't
identify protocol
I know, that I can tune the systemd service parameter LimitNOFILE=x to a
bigger value, but it doesn't change the behaviour. Eventually the kernel
sockets resource runs out.
Even if the radiusd hits the "Too many open files", it would be a great
help, if the process would exit. all resources would be freed and
systemd would restart it after 10 seconds. But the radiusd doesn't die
nor serve.
Related debian packages and versions:
perl 5.20.2-3+deb8u4
libdbi-perl 1.631-3+b1
libdbd-mysql-perl 4.028-2+b1
libmysqlclient18:amd64 5.5.47-0+deb8u1
With the specified configuration above, major problem occured when the
database server was offline for more than 2..3 hours (all the 1024
reconnection attempts timed out).
Suddenly the instance stopped answering and also the logfile stopped
with the following line and then nothing.
Sat Mar 25 01:29:06 2017: DEBUG: Connecting to
'dbi:mysql:dbname:x.x.x.x:3307'
Increasing the file limits is a poor improvement, not a solution to root
cause.
Is there something wrong with DBI?
Or is there a better way to avoid depletion of resources?
kind regards,
--
rauno
More information about the radiator
mailing list