[RADIATOR] mysql DBI timed out connections and too many open files (sockets)
Rauno Tuul
raunz at hot.ee
Thu Apr 20 10:31:57 UTC 2017
Hi,
The radiusd process has to have effective open file limit, instead of
"unlimited". Debian systemd limits non-root processes to 1024, also the
centos-5 has the same limit even for root user process.
Despite the open file limit, You should see the number of open sockets
for radiusd process increasing.
I did a quick test on a random CentOS release 5.11 box with
Radiator-4.11-1. Installed the rpm, changed only radius.cfg to the
following:
grep -v "^#" /etc/radiator/radius.cfg
LogDir /var/log/radius
DbDir /etc/radiator
Trace 4
<Client DEFAULT>
Secret asd
DupInterval 0
</Client>
<Realm DEFAULT>
<AuthBy SQL>
DBSource dbi:mysql:dbname:127.0.0.1:3310
Timeout 1
FailureBackoffTime 1
DBUsername someuser
DBAuth pointlesspassword
AuthSelect select 'random' from nowhere
AuthColumnDef 0, User-Name, check
NoDefault
</AuthBy>
</Realm>
Started the service:
# /etc/init.d/radiator start
# ps uaxww | grep radiusd
root 18839 0.2 1.0 27352 15712 pts/0 S 11:52 0:03
/usr/bin/perl /usr/bin/radiusd -pid_file /var/log/radius/radiusd.pid
-config_file /etc/radiator/radius.cfg -daemon
# cat /proc/`pgrep radiusd`/limits | grep files
Max open files 1024 1024 files
Killed off the IP level access to non-existing Mysql
# iptables -t filter -A INPUT -p TCP -i lo -s 127.0.0.1 --dport 3310 -j DROP
# head /var/log/radius/logfile
Thu Apr 20 11:52:07 2017: DEBUG: Finished reading configuration file
'/etc/radiator/radius.cfg'
Thu Apr 20 11:52:08 2017: DEBUG: Creating authentication port 0.0.0.0:1645
Thu Apr 20 11:52:08 2017: DEBUG: Creating accounting port 0.0.0.0:1646
Thu Apr 20 11:52:08 2017: NOTICE: Server started: Radiator 4.11
Thu Apr 20 11:56:13 2017: DEBUG: Packet dump:
*** Received from 127.0.0.1 port 50719 ....
Code: Access-Request
Identifier: 118
Authentic: <129>)$<226>3<157>"<3><169><226><219><216>'<216><250><225>
Attributes:
User-Name = "mikem"
Service-Type = Framed-User
NAS-IP-Address = 203.63.154.1
NAS-Identifier = "203.63.154.1"
NAS-Port = 1234
Called-Station-Id = "123456789"
Calling-Station-Id = "987654321"
NAS-Port-Type = Async
User-Password = <137>&<8>\<180>_^<201>6mqwT&<207><158>
Thu Apr 20 11:56:13 2017: DEBUG: Handling request with Handler
'Realm=DEFAULT', Identifier ''
Thu Apr 20 11:56:13 2017: DEBUG: Deleting session for mikem,
203.63.154.1, 1234
Thu Apr 20 11:56:13 2017: DEBUG: Handling with Radius::AuthSQL:
Thu Apr 20 11:56:13 2017: DEBUG: Handling with Radius::AuthSQL:
Thu Apr 20 11:56:13 2017: DEBUG: Connecting to
'dbi:mysql:dbname:127.0.0.1:3310'
Thu Apr 20 11:56:14 2017: ERR: Could not connect to SQL database with
DBI->connect dbi:mysql:dbname:127.0.0.1:3310, someuser, pointlesspassword:
Thu Apr 20 11:56:14 2017: ERR: Could not connect to any SQL database.
Request is ignored. Backing off for 1 seconds
Thu Apr 20 11:56:14 2017: DEBUG: AuthBy SQL result: IGNORE, User
database access error
Then all I had to do, is hit it often with radpwtst.
$ for i in `seq 1 10000`; do radpwtst -secret asd -s 127.0.0.1 -noacct
-nostart -auth_port 1645 -timeout 1; done
The network connections attempts are growing and then the idling "sock"s
remain there.
# lsof -p `pgrep radiusd`
...
radiusd 18839 root 244u sock 0,5 0t0 7898980 can't identify
protocol
radiusd 18839 root 245u sock 0,5 0t0 7898989 can't identify
protocol
radiusd 18839 root 246u sock 0,5 0t0 7899002 can't identify
protocol
radiusd 18839 root 247u sock 0,5 0t0 7899011 can't identify
protocol
radiusd 18839 root 248u sock 0,5 0t0 7899020 can't identify
protocol
radiusd 18839 root 249u IPv4 7899029 0t0 TCP
127.0.0.1:33281->127.0.0.1:dyna-access (SYN_SENT)
radiusd 18839 root 250u IPv4 7899041 0t0 TCP
127.0.0.1:33282->127.0.0.1:dyna-access (SYN_SENT)
radiusd 18839 root 251u IPv4 7899050 0t0 TCP
127.0.0.1:33283->127.0.0.1:dyna-access (SYN_SENT)
radiusd 18839 root 252u IPv4 7899068 0t0 TCP
127.0.0.1:33284->127.0.0.1:dyna-access (SYN_SENT)
...
50 minutes later the logfile stops growing:
Thu Apr 20 12:46:40 2017: DEBUG: Packet dump:
*** Received from 127.0.0.1 port 40107 ....
Code: Access-Request
Identifier: 85
Authentic: <177><239><172>6<253>o<251><18><144><157><147>"<163>k<130>.
Attributes:
User-Name = "mikem"
Service-Type = Framed-User
NAS-IP-Address = 203.63.154.1
NAS-Identifier = "203.63.154.1"
NAS-Port = 1234
Called-Station-Id = "123456789"
Calling-Station-Id = "987654321"
NAS-Port-Type = Async
User-Password =
X<187><216><213>g<154><218><222><218><131><156>v<148>p2<198>
Thu Apr 20 12:46:40 2017: DEBUG: Handling request with Handler
'Realm=DEFAULT', Identifier ''
Thu Apr 20 12:46:40 2017: DEBUG: Deleting session for mikem,
203.63.154.1, 1234
Thu Apr 20 12:46:40 2017: DEBUG: Handling with Radius::AuthSQL:
Thu Apr 20 12:46:40 2017: DEBUG: Handling with Radius::AuthSQL:
Thu Apr 20 12:46:40 2017: DEBUG: Connecting to
'dbi:mysql:dbname:127.0.0.1:3310'
Total list of lsof is 1064 line and the open "sock" count has reached 1019.
# lsof -p `pgrep radiusd` | grep -c sock
1019
The main process has reached "Too many open files" and is unable to
serve anymore.
# strace -p `pgrep radiusd`
stat64("/var/log/radius", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
open("/var/log/radius/logfile", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE,
0666) = -1 EMFILE (Too many open files)
write(2, "Log could not append 'Thu Apr 20"..., 652) = 652
gettimeofday({1492682926, 331375}, NULL) = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2161, ...}) = 0
stat64("/var/log/radius", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
open("/var/log/radius/logfile", O_WRONLY|O_CREAT|O_APPEND|O_LARGEFILE,
0666) = -1 EMFILE (Too many open files)
write(2, "Log could not append 'Thu Apr 20"..., 231) = 231
gettimeofday({1492682926, 332748}, NULL) = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2161, ...}) = 0
stat64("/var/log/radius", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
It took me an hour and a half to reproduce the issue on entirely new
environment to Radiator without any changes on OS side.
The problem exists with debian and centos, so it's not a distro issue.
Please share the details about the try to reproduce the issue when the
problem didn't occur. I really would like to know, where's the difference.
kind regards,
--
rauno
On 20.04.2017 11:10, Heikki Vatiainen wrote:
> On 14.4.2017 13.56, Rauno Tuul wrote:
>
>> The issue occurs, when attempting to establish a connection. For example
>> the host is totally down or firewalled.
>> Is the TCP connection has been established, then there's no problem.
>
> Thanks for confirming this. I talked with one of our people about this
> and he said the problem could not be reproduced. What he used was not a
> Debian 8. Also, web search did not turn up similar problems, so this
> looks like something that is not a universal problem.
>
> In other words, no real update on this yet.
>
>> Easy way to reproduce the issue, is to drop mysql traffic with iptables.
>> Example setup:
>> <AuthBy SQL>
>> DBSource dbi:mysql:dbname:127.0.0.1:3306
>> Timeout 1
>> FailureBackoffTime 1
>> Then drop the traffic with 'iptables -A INPUT -i lo -p tcp -m tcp
>> --dport 3306 -j DROP' and hit the instance with radpwtst every second.
>> If the debian systemd set the open file limit to 1024, then the instance
>> will collapse within 20 minutes.
>
> Thanks for this additional information. I have made a note about this.
>
> Thanks,
> Heikki
>
More information about the radiator
mailing list