(RADIATOR) Radiator going down after Oracle SQL Timeout
Mariano Absatz
radiator at lists.com.ar
Fri Feb 1 12:39:37 CST 2002
Hi Hugh,
well... time passed and this happened again but, alas, there is no message
whatsoever that indicates what happened.
I kept using supervise which, instead of mailing me, is sending all of
standard output + standard error through a logger (multilog) which timestamps
it and writes it to a file.
Here's Radiator's log of the moment of the problem:
===================START OF RADIATOR LOG===========================
Wed Jan 30 01:21:44 2002: DEBUG: SNMPAgent: received request from
192.168.1.2, 128, 535302554, snmp-community
Wed Jan 30 01:22:15 2002: DEBUG: Packet dump:
*** Received from 10.133.56.33 port 1645 ....
Code: Access-Request
Identifier: 124
Authentic: <236>0<179><185>W@"-)A4<194>r?w-
Attributes:
NAS-IP-Address = 10.133.56.33
NAS-Port = 30
NAS-Port-Type = Async
User-Name = "uncliente at pbm"
Called-Station-Id = "0380"
Calling-Station-Id = "1141399338"
User-Password = "<<247>:<232><232>d<224><135>@<255>`QRs.<218>"
Service-Type = Framed-User
Framed-Protocol = PPP
Wed Jan 30 01:22:15 2002: DEBUG: Rewrote user name to uncliente at pbm
Wed Jan 30 01:22:15 2002: DEBUG: Check if Handler Pert-PreRegUser-Flag = 1,
Request-Type = Access-Request should be used to handle this request
Wed Jan 30 01:22:15 2002: DEBUG: Check if Handler Request-Type =
Access-Request should be used to handle this request
Wed Jan 30 01:22:15 2002: DEBUG: Handling request with Handler 'Request-Type
= Access-Request'
Wed Jan 30 01:22:15 2002: DEBUG: Rewrote user name to uncliente at pbm
Wed Jan 30 01:22:15 2002: DEBUG: SessDBUsers Deleting session for
uncliente at pbm, 10.133.56.33, 30
Wed Jan 30 01:22:15 2002: DEBUG: do query is: DELETE FROM USUARIOS_EN_LINEA
WHERE USUA_IP_NAS='10.133.56.33' AND USUA_PORT=030
Wed Jan 30 01:22:21 2002: DEBUG: Handling with Radius::AuthSQL
Wed Jan 30 01:22:21 2002: DEBUG: Handling with
Radius::AuthSQL:UserGetPassword
Wed Jan 30 01:22:21 2002: DEBUG: Query is: SELECT U.USU_CLAVE, S.SER_CODIGO,
S.SER_MAX_SESSION_CONCURRENTES, S.TIMEFRAMEID, S.SER_GEN_CHECK,
S.SER_GEN_REPLY, U.USU_IP_NRO_FIJA, U.USU_IP_MASC_FIJA,
U.USU_TIEMPO_RESTANTE, U.USU_BYTES_RESTANTES, U.USU_SUSPENDIDO,
U.USU_GEN_CHECK, U.USU_GEN_REPLY, VS.VISP_SER_VALID_DNIS FROM USUARIOS U,
VISP V, SERVICIOS S, VISP_SERVICIOS VS WHERE U.VISP_CODIGO = V.VISP_CODIGO
AND U.SER_CODIGO = S.SER_CODIGO AND U.USU_CODIGO = 'uncliente' AND
U.VISP_CODIGO = 'pbm' AND V.VISP_CODIGO = VS.VISP_CODIGO AND S.SER_CODIGO =
VS.SER_CODIGO AND '0380' LIKE VS.VISP_SER_VALID_DNIS
Wed Jan 30 01:22:37 2002: DEBUG: Radius::AuthSQL looks for match with
uncliente at pbm
Wed Jan 30 01:22:37 2002: DEBUG: Query is: SELECT USUA_IP_NAS, USUA_PORT,
USUA_SESION_ID FROM USUARIOS_EN_LINEA WHERE USU_CODIGO ='uncliente' AND
VISP_CODIGO='pbm'
Wed Jan 30 01:23:16 2002: DEBUG: Radius::AuthSQL ACCEPT:
Wed Jan 30 01:23:16 2002: DEBUG: Handling with PORTLIMITCHECK
Wed Jan 30 01:23:16 2002: DEBUG: Query is: SELECT COUNT(*) FROM
USUARIOS_EN_LINEA WHERE VISP_CODIGO = 'pbm' AND SER_CODIGO =
'Teletrabajo_PBM' AND '0380' LIKE VISP_SER_VALID_DNIS
Wed Jan 30 01:24:16 2002: ERR: Execute failed for 'SELECT COUNT(*)
FROM USUARIOS_EN_LINEA WHERE VISP_CODIGO = 'pbm' AND SER_CODIGO =
'Teletrabajo_PBM' AND '0380' LIKE VISP_SER_VALID_DNIS': SQL Timeout
******************** HERE'S WHERE RADIATOR DIED ************************
Wed Jan 30 01:25:58 2002: ERR: Could not connect to SQL database with
DBI->connect dbi:Oracle:host=db;sid=RADP, oraUser, oraPassword: timeout at
/usr/local/lib/perl5/site_perl/5.6.1/Radius/Util.pm line 507, <FILE>
line 20.
Wed Jan 30 01:25:58 2002: ERR: Could not connect to any SQL database. Request
is ignored. Backing off for 0 seconds
Wed Jan 30 01:25:58 2002: DEBUG: Reclaiming expired leases
Wed Jan 30 01:25:58 2002: DEBUG: do query is: UPDATE POOL_IP SET OCUPADA = 0,
TIME_STAMP = 1012364758 WHERE OCUPADA != 0 AND EXPIRA < 1012364758
Wed Jan 30 01:26:53 2002: WARNING: Unknown service name
Wed Jan 30 01:26:53 2002: INFO: Server started: Radiator 2.18.4 on radius1
Wed Jan 30 01:31:12 2002: DEBUG: SNMPAgent: received request from
192.168.1.2, 128, 913637970, snmp-community
Wed Jan 30 01:31:12 2002: DEBUG: SNMPAgent: received request from
192.168.1.2, 128, 913637970, snmp-community
Wed Jan 30 01:31:12 2002: DEBUG: SNMPAgent: received request from
192.168.1.2, 128, 913637970, snmp-community
Wed Jan 30 01:31:12 2002: DEBUG: SNMPAgent: received request from
192.168.1.2, 128, 913637970, snmp-community
Wed Jan 30 01:31:12 2002: DEBUG: SNMPAgent: received request from
192.168.1.2, 128, 913637970, snmp-community
Wed Jan 30 01:31:43 2002: DEBUG: SNMPAgent: received request from
192.168.1.2, 128, 243092869, snmp-community
Wed Jan 30 01:31:45 2002: DEBUG: SNMPAgent: received request from
192.168.1.2, 128, 243092869, snmp-community
Wed Jan 30 01:36:43 2002: DEBUG: SNMPAgent: received request from
192.168.1.2, 128, 261688663, snmp-community
Wed Jan 30 01:36:45 2002: DEBUG: SNMPAgent: received request from
192.168.1.2, 128, 261688663, snmp-community
Wed Jan 30 01:41:44 2002: DEBUG: SNMPAgent: received request from
192.168.1.2, 128, 1050711131, snmp-community
Wed Jan 30 01:46:44 2002: DEBUG: SNMPAgent: received request from
192.168.1.2, 128, 1153891048, snmp-community
Wed Jan 30 01:46:46 2002: DEBUG: SNMPAgent: received request from
192.168.1.2, 128, 1153891048, snmp-community
Wed Jan 30 01:47:16 2002: DEBUG: Packet dump:
*** Received from 10.133.56.33 port 1645 ....
Code: Access-Request
Identifier: 149
Authentic: <164>pc<205><217><179><130>M<169><140><204><21><204><6><201><248>
Attributes:
NAS-IP-Address = 10.133.56.33
NAS-Port = 24
NAS-Port-Type = Async
User-Name = "uncliente at pbm"
Called-Station-Id = "0380"
Calling-Station-Id = "1141399338"
User-Password =
"<31><137><188>PI<171>b\%<185><19><207><13><213><149>K"
Service-Type = Framed-User
Framed-Protocol = PPP
Wed Jan 30 01:47:16 2002: DEBUG: Rewrote user name to uncliente at pbm
Wed Jan 30 01:47:16 2002: NOTICE: Request from unknown client 10.133.56.33:
ignored
===================END OF RADIATOR LOG===========================
And following, is the log of Radiator's stdout+stderr:
2002-01-29 18:24:07.040854500 DBD::Oracle::db do failed: ORA-01401: inserted
value too large for column (DBD ERROR: OCIStmtExecute) at
/usr/local/lib/perl5/site_perl/5.6.1/Radius/SqlDb.pm line 232.
2002-01-30 01:24:56.250600500 STARTING
2002-01-30 01:24:56.251088500 /app/Radiator/bin/radiusd rad_instance=auth
BaseDir=/app/Radiator AuthPort=1812 AcctPort= SNMPPort=16112
-config_file /app/Radiator/etc/radius-main.cfg -foreground
As you can see there is a DBD message for hours before (that's something else
I'm reparing right now, but it's non-related), and the following message is
an "echo" I send right before exec'ing Radiator with the following command.
There's no message telling anything, I saved the core dump but I don't know
how to extract any info from there...
I'm using Radiator (2.18.4) and have all of my data on a remote Oracle
(8.1.6) server.
Both machines are Sun Netra with Solaris 8. Perl version is 5.6.1.
Any clues? my client wants some good explanation of what's going on...
TIA
El 13 Dec 2001 a las 12:13, Hugh Irvine escribió:
>
> Hello Mariano -
>
> What you describe below sounds to me like a problem with the DBD-Oracle
> module. I would suggest that you try to use the "restartWrapper" program that
> we provide in the distribution ("goodies/restartWrapper") instead of
> "supervise" (at least for debugging this problem). The restartWrapper program
> can be set up with a delay before restarting, and it can also be configured
> to email a designated email address with the exit status and any error
> messages that were written to stderr. We should then be able to see what is
> causing Radiator to die.
>
> regards
>
> Hugh
>
>
> On Thu, 13 Dec 2001 08:14, Mariano Absatz wrote:
> > Hi,
> >
> > I'm having the following problem:
> >
> > I'm using Radiator (2.18.4) and have all of my data on a remote Oracle
> > (8.1.6) server.
> >
> > Both machines are Sun Netra with Solaris 8. Perl version is 5.6.1.
> >
> > There are two instances of Radiator (one for authentication and the other
> > for accounting).
> >
> > The problem is the following. If the Oracle server goes down, the queries
> > time out (that's reasonable). The point is some times (not after every SQL
> > timeout, but after some of them), Radiator goes down. It seems to be that
> > this happens when the query in question is necessary as part of the
> > authentication (e.g. during a username lookup or simultaneous use or port
> > limit check), but not when it is nonessential (as a deletion from the
> > radonline table for the nas/port recently received or an insertion in an
> > AuthLog).
> >
> > On only one ocassion I saw the "Could not connect to any SQL database.
> > Request is ignored. Backing off for 600 second" message, but even that
> > time, Radiator went down.
> >
> > I'm using daemontool's supervise (http://cr.yp.to/daemontools.html) to keep
> > the servers running so the server starts up again almost immediately. I see
> > the messages when it is starting again in the log.
> >
> > The question is, why is Radiator silently shutting down rather than backing
> > off?
> >
> > One of the main problems is that on the almost immediate restart, the first
> > thing Radiator tries to do is to read the client list from the database. If
> > Oracle is still down, it won't read it, it won't retry, and (since there
> > are no hardwired <Client>'s in the config file, it won't accept anything
> > from any NAS.
> >
> > Regretfully, supervise's log is autorotated and autoerased on a size basis
> > and I don't have the output to correlate with Radiator's log.
> >
> > I'm attaching parts of the logs showing the SQL Timeout error immediately
> > followed by Radiator starting up again (via supervise).
> >
> > The "DEBUG: Adding Clients from SQL database" is the first message issued
> > by a NEW Radiator starting.
> >
> > I'm also attaching the whole set of configuration files (the main one is
> > radius-main.cfg) in a zip file.
>
> --
> Radiator: the most portable, flexible and configurable RADIUS server
> anywhere. Available on *NIX, *BSD, Windows 95/98/2000, NT, MacOS X.
> -
> Nets: internetwork inventory and management - graphical, extensible,
> flexible with hardware, software, platform and database independence.
--
Mariano Absatz
El Baby
----------------------------------------------------------
My mail reader can beat up your mail reader.
===
Archive at http://www.open.com.au/archives/radiator/
Announcements on radiator-announce at open.com.au
To unsubscribe, email 'majordomo at open.com.au' with
'unsubscribe radiator' in the body of the message.
More information about the radiator
mailing list