[RADIATOR] DB logging

Karri Huhtanen kh at open.com.au
Thu Jul 18 13:34:09 UTC 2019


On Thu, 18 Jul 2019 at 15:22, Eric W. Bates <ericx at whoi.edu> wrote:
>
> Here's the scoop. I'm asking because this took me several months to
> notice the problem and diagnose and I'm wondering whether this is a
> logging bug or whether there are additional logging functions I can
> enable. I'm still using an old version of Radiator: 4.15 (so that may be
> my problem).

There is LogIgnore in newer versions of Radiator which helps to log
requests which have been ignored for some reason. There's also
EAPTLS_TraceState, which helps to trace PEAP/TTLS authentication
problems and ResponseTimeThreshold (mentioned here
https://www.slideshare.net/radiatorsoftware/tnc19-radiator-technical-workshop-meet-radiator-developers
), which helps to track how long request have been processed.

> I added a MySQL backup using automysqlbackup from the main Ubuntu
> repository and have been successfully making daily dumps for about 9
> months. By default this runs at 06:25 every morning. The accounting
> tables are quite large (I'm in the process of adding automatic pruning);
> so the dump takes about 20 minutes.

Have you monitored what is the load of the server during the backup?
>From your debug log timestamps it looks like that even very simple
authentications steps have been 2 or more minutes apart from each
other.

Do you run only single Radiator process/instance?

> Occasionally I have received sporadic complaints of authentication
> failures during that time window. When I have investigated, I have found
> no information in my trace level 4 logs that would indicate any database
> issue except for reconnection notices after the backup has completed. In
> addition, there are no packet dumps showing connection attempts; so I
> put the blame on my VPN.

I have a working theory about why this happens, but need a bit more
information from you to verify some of the environment/configuration
things.

What I think happens is that first your database backup starts. Host
and or database becomes unresponsive or slow to respond because of the
load. The database connection however does not timeout or go away.
Radiator gets a request (which looks like PEAP) and tries to query the
database. The query is sent to database, which does not respond or
responds only after significant delay. Radiator as a single thread
application sits there waiting for database to respond, but the
database either does not do that or responds so slowly that the RADIUS
clients, which originally sent the request, timeout.

Why this does not show in your Radiator version logs is probably the
combination of older Radiator version, PEAP and possibly AuthBy
RADMIN. We have increased logging features in the Radiator versions
since 4.15 to catch more easily this kind of errors. The earlier
versions of Radiator did not for example log if PEAP authentication
was interrupted (for example because of certificate problems, client
going away etc.). This might occur in your case so that first some
PEAP authentication is started and because database does not respond,
it is not finished probably because of the MSCHAP-V2
challenge/response cycle. As PEAP does not reach a decision, in older
Radiator, it is not logged to be timeoutted. The database on the other
hand may still have the connection up and Radiator waiting for
response so no error about not able to connect to database is logged
either. To the NAS client Radiator looks like a black hole as it does
not respond to its requests, so you get timeouts.

What kind of command line options are you using on radpwtst probe?
Does it do the PEAP/MSCHAP-V2 request mentioned there in your logs or
something standard RADIUS?

> I have set up Nagios to authenticate (via radpwtest) every 5 minutes and
> it does, in fact, report failures during the SQL backup, but again,
> there is ~0~ evidence in the logs that there is any trouble or even
> attempts to authenticate.
>
> Here's the Nagios log from Wednesday morning:
>    (date -d @1563345993 == Wed Jul 17 02:46:33 EDT 2019)
>
> [1563345993] SERVICE ALERT: radius2;Systems vpn
> auth;UNKNOWN;SOFT;1;UNKNOWN No reply
> [1563346005] SERVICE ALERT: radius2;ACOMMS vpn
> auth;UNKNOWN;SOFT;1;UNKNOWN No reply
> [1563346104] SERVICE ALERT: radius;Systems vpn
> auth;UNKNOWN;SOFT;1;UNKNOWN No reply
> [1563346106] SERVICE ALERT: radius;CUI vpn auth;UNKNOWN;SOFT;1;UNKNOWN
> No reply
> [1563346117] SERVICE ALERT: radius;ACOMMS vpn
> auth;UNKNOWN;SOFT;1;UNKNOWN No reply
> [1563346118] SERVICE ALERT: radius2;Systems vpn
> auth;UNKNOWN;SOFT;2;UNKNOWN No reply
> [1563346127] SERVICE ALERT: radius2;CUI vpn auth;UNKNOWN;SOFT;1;UNKNOWN
> No reply
> [1563346130] SERVICE ALERT: radius2;ACOMMS vpn
> auth;UNKNOWN;SOFT;2;UNKNOWN No reply
> [1563346230] SERVICE ALERT: radius;Systems vpn
> auth;UNKNOWN;SOFT;2;UNKNOWN No reply
> [1563346231] SERVICE ALERT: radius;CUI vpn auth;UNKNOWN;SOFT;2;UNKNOWN
> No reply
> [1563346242] SERVICE ALERT: radius;ACOMMS vpn
> auth;UNKNOWN;SOFT;2;UNKNOWN No reply
>
>    (date -d @1563346242 == Wed Jul 17 02:50:42 EDT 2019)
>
> During this time backup interval, the Nagios test simply times out.
> Radiator gives no reply.
>
> At the same time in the level 4 Radiator trace (this log is also EDT):
>
> Wed Jul 17 02:45:39 2019 038925: DEBUG: PEAP Tunnelled request Packet dump:
> Code:       Access-Request
> Identifier: UNDEF
> Authentic:  SB<128>R<145><190>;A'b<10><252>i%X<218>
> Attributes:
>         EAP-Message = <2>`<0><18><1>somebody at mydomain.edu
>         Message-Authenticator = <0><0><0><0><0><0><0><0><0><0><0><0><0><0><0><0>
>         NAS-IP-Address = 192.168.0.183
>         NAS-Identifier = "SafetyOffice"
>         NAS-Port = 0
>         Calling-Station-Id = "24-F0-94-D0-E7-DF"
>         User-Name = ""
>
> Wed Jul 17 02:45:39 2019 039238: DEBUG: Handling request with Handler
> 'TunnelledByPEAP = 1, NAS-IP-Address = /^192\.168\.0.*/', Identifier
> 'TunnelledByPeap-IP'
> Wed Jul 17 02:47:41 2019 420670: DEBUG: Handling with
> Radius::AuthRADMIN: CheckRAdminMSCHAPv2
> Wed Jul 17 02:49:41 2019 471263: DEBUG: Handling with
> Radius::AuthRADMIN: CheckRAdminMSCHAPv2
> Wed Jul 17 02:51:41 2019 475543: DEBUG: Handling with EAP: code 2, 96, 18, 1
> Wed Jul 17 02:53:41 2019 537452: DEBUG: Response type 1
> Wed Jul 17 02:55:41 2019 872823: DEBUG: EAP result: 3, EAP MSCHAP-V2
> Challenge
> Wed Jul 17 02:55:42 2019 298580: DEBUG: AuthBy RADMIN result: CHALLENGE,
> EAP MSCHAP-V2 Challenge
> Wed Jul 17 02:55:42 2019 298821: DEBUG: Access challenged for : EAP
> MSCHAP-V2 Challenge

If you look at timestamps there are several minutes between lines even
without accessing the database? Do you have any information about the
load of the system?

> Not only is there no log indicating that the DB has timed out, there is
> no log of the authentication attempts by Nagios. The log is basically
> empty for the time window of the DB backup.
>
> After the backup is complete, there is a log message:

This looks like my working theory may be correct. Not responding
database with PEAP and MSCHAP-V2 jams the Radiator process while it
tries to process the request and during that time Radiator does not
process any other requests, which explain why there are no log entries
either.

> Wed Jul 17 02:55:42 2019 920915: DEBUG: Connecting to
> 'dbi:mysql:radmin:whoiradius-255.whoinet.mydomain.edu'
>
> Is that really the only indication that the DB has gone away?

No, if database goes away so that it cannot be connected, there are
complaints about it and reconnection messages. But if the Radiator
process jams to wait for the database to respond, then there are no
log entries either.

br,

// kh

> On 7/17/19 5:07 AM, Karri Huhtanen wrote:
> > On Tue, 16 Jul 2019 at 14:30, Eric W. Bates <ericx at whoi.edu> wrote:
> >>
> >> I'm using MySQL as my backend for radiator and I think I'm having SQL
> >> timeouts whenever the database is backed up.
> >>
> >> Does Radiator log SQL errors and/or how do I get those to show up?
> >
> > Radiator only logs SQL errors when it tried to access the database and
> > cannot do that for some reason. These errors are logged in the main
> > Radiator logfile at least with Trace level 3. If you are trying to
> > catch the timeout I think setting up radpwtst to try to authenticate
> > every second might help to generate SQL requests also at the your
> > database is backed up. You can use for example cron or  watch and
> > screen to setup this kind of test. Please ensure that the
> > authentication request is that kind of authentication which makes
> > Radiator to access the database, otherwise this test does not help
> > you.
> >
> > br,
> >
> > // kh
> >
>
>


More information about the radiator mailing list