[RADIATOR] Possible internal race condition leading to 'duplicate' packets and 'unknown reply' messages?

Stefan Winter stefan.winter at restena.lu
Thu Feb 20 07:53:44 UTC 2020


Hi,


FWIW, the other end is my server :-)


For illustration, a typical sequence of such warnings is below (with a
different peer to whom my server talks more regularly, so more traffic
and more opportunity for races).


Interestingly, you can see that the comms between the two servers went
okay for the first 636 requests and 1:44 minutes (this is following a
fresh server restart this morning) but then these messages start coming.
It certainly doesn't happen for every request. And we haven't heard of
complaints about not working roaming, so this is likely not a fatal
problem.

If you look at the sequence of request IDs they are even not always
increasing, suggesting that the race condition if any is not restricted
to "reply received too quickly, not in queue yet" but it must be
something else... Of course, a duplicate reply with the same ID which
was already handled earlier could lead to this message, would explain
why there is no actual auth problem - and would be consistent with what
the other Stefan posted.


Greetings,


Stefan Winter


Thu Feb 20 08:35:11 2020 247798: NOTICE: Server started: Radiator 4.24
on tld1.eduroam.lu

Thu Feb 20 08:36:55 2020 031935: WARNING: Unknown reply received in
AuthRADSEC for request 637 from 2001:878:0:e000:82:e1:f2:6d
(2001:878:0:e000:82:e1:f2:6d:2083)
Thu Feb 20 08:36:55 2020 090479: WARNING: Unknown reply received in
AuthRADSEC for request 643 from 2001:878:0:e000:82:e1:f2:6d
(2001:878:0:e000:82:e1:f2:6d:2083)
Thu Feb 20 08:36:57 2020 012434: WARNING: Unknown reply received in
AuthRADSEC for request 814 from 2001:878:0:e000:82:e1:f2:6d
(2001:878:0:e000:82:e1:f2:6d:2083)
Thu Feb 20 08:37:08 2020 030662: WARNING: Unknown reply received in
AuthRADSEC for request 751 from 2001:878:0:e000:82:e1:f2:6d
(2001:878:0:e000:82:e1:f2:6d:2083)
Thu Feb 20 08:37:14 2020 026173: WARNING: Unknown reply received in
AuthRADSEC for request 934 from 2001:878:0:e000:82:e1:f2:6d
(2001:878:0:e000:82:e1:f2:6d:2083)
Thu Feb 20 08:37:20 2020 031046: WARNING: Unknown reply received in
AuthRADSEC for request 871 from 2001:878:0:e000:82:e1:f2:6d
(2001:878:0:e000:82:e1:f2:6d:2083)
Thu Feb 20 08:37:26 2020 027694: WARNING: Unknown reply received in
AuthRADSEC for request 967 from 2001:878:0:e000:82:e1:f2:6d
(2001:878:0:e000:82:e1:f2:6d:2083)
Thu Feb 20 08:37:39 2020 360532: WARNING: Unknown reply received in
AuthRADSEC for request 1006 from 2001:878:0:e000:82:e1:f2:6d
(2001:878:0:e000:82:e1:f2:6d:2083)
Thu Feb 20 08:37:39 2020 414663: WARNING: Unknown reply received in
AuthRADSEC for request 1007 from 2001:878:0:e000:82:e1:f2:6d
(2001:878:0:e000:82:e1:f2:6d:2083)
Thu Feb 20 08:38:19 2020 033411: WARNING: Unknown reply received in
AuthRADSEC for request 1124 from 2001:878:0:e000:82:e1:f2:6d
(2001:878:0:e000:82:e1:f2:6d:2083)


Am 18.02.20 um 23:49 schrieb Stefan Paetow:
> Hi, 
>
> Apologies for the long introduction to this problem, I'm new to Radiator, so I'm just trying to make sure I explain things correctly.
>
> So us (running the UK eduroam infrastructure) and our equivalent in another country are trying to resolve some spurious 'Unknown reply received in AuthRADSEC for request ...' messages they and us see in our Radiator logs. For clarity, we (Jisc) are running the latest (4.24.1) on both our servers, both accept both RADIUS and RadSec traffic. At this point it's not entirely clear whether this is limited to RadSec or not, but we're just working with the RadSec traffic for now. The other side has raised the issue with one of the servers, so we’ll stick to that for the moment.
>
> We have 24 Radiator processes running on this server. We have a NoReplyHook that will send an Access-Reject packet back to an originating ‘downstream’ when the home server for an Access-Request stops responding to packets in time. The code in the hook has (probably) not changed for several years, so it's assumed it's been fine. So we get a question out of the blue asking whether we filter out the Proxy-State (which would be very unusual given it's fundamental to making RADIUS proxying work). 
>
> So I've switched on the GlobalMessageLog for RadSec (with a hook for the other end's IP) and turns out in our log that our NoReplyHook code fires as appropriate, but because it doesn't include a Proxy-State, it then cannot be matched to the appropriate request packet on the downstream (and they see a 'AuthRADSEC Could not get extended identifier: No Proxy-State attribute found in reply' message, which prompted the question. In *our* log, we then see *another* RADIUS packet, with the same Identifier as the packet created by the NoReplyHook, being fired back to the downstream, this time *with* a Proxy-State that matches the incoming packet. 
>
> So, we seem to first cause this in the other end:
>
> <date time etc> AuthRADSEC Could not get extended identifier: No Proxy-State attribute found in reply from ...
> <date time etc> Unknown reply received in AuthRADSEC for request ...
>
> Our side looks like this:
>
> 1581703432.398767 Fri Feb 14 18:03:52 2020
> RADIUS message from <downstream> port 48376 to <our server> port 2083 ....
> Packet length = 516
> Code:       Access-Request
> Identifier: 209
> Authentic:  <215>u<248>-n<185>2<179>3<15><227><243>jI<127><249>
> Attributes:
> 	User-Name = "<redacted>"
> 	Chargeable-User-Identity = 
> 	Location-Capable = <0><0><0><1>
> 	Calling-Station-Id = "18-81-0e-58-3c-7a"
> 	Called-Station-Id = "bc-26-c7-be-49-10:eduroam"
> 	NAS-Port = <0><0><0><8>
> 	cisco-avpair = "audit-session-id=0515c70a0002c48dd4e0465e"
> 	Acct-Session-Id = "5e46e0d3/18:81:0e:58:3c:7a/196306"
> 	NAS-IP-Address = <10><199><21><5>
> 	NAS-Identifier = "HCVDL_WLC5K1"
> 	Airespace-WLAN-Id = <0><0><0><3>
> 	Service-Type = <0><0><0><2>
> 	Framed-MTU = <0><0><5><20>
> 	NAS-Port-Type = <0><0><0><19>
> 	Tunnel-Type = <0><0><0><13>
> 	Tunnel-Medium-Type = <0><0><0><6>
> 	Tunnel-Private-Group-ID = 921
> 	EAP-Message = <trimmed>
> 	State = <trimmed>
> 	Message-Authenticator = <242><183><209>b<235>_Q<6><171><241><180><208><246><172>g<15>
> 	Operator-Name = "<redacted>"
> 	Proxy-State = OSC-Extended-Id=26321
>
> 1581703432.498707 Fri Feb 14 18:03:52 2020
> RADIUS message from <our server> port 2083 to <downstream> port 48376 ....
> Packet length = 131
> Code:       Access-Reject
> Identifier: 209
> Authentic:  ...
> Attributes:
> 	Cisco-Abort-Cause = "Home organisation RADIUS server(s) not responding, so eduroam UK rejected the request"
> 	Message-Authenticator = ...
>
> 1581703432.593273 Fri Feb 14 18:03:52 2020
> RADIUS message from <our server> port 2083 to <downstream> port 48376 ....
> Packet length = 170
> Code:       Access-Reject
> Identifier: 209
> Authentic:  ...
> Attributes:
> 	Cisco-Abort-Cause = "Home organisation RADIUS server(s) not responding, so eduroam UK rejected the request"
> 	Message-Authenticator = ...
> 	Reply-Message = "Request Denied"
> 	Proxy-State = OSC-Extended-Id=26321
>
> You can see here that there's a duplicate Access-Reject packet. The Proxy-State of the second one matches the Proxy-State of the incoming Access-Request, the Identifier of all three packets is identical.
>
> Is there a race condition thing going on that potentially causes a 'mis-fire' of sorts? To make this more interesting, I then adjusted the NoReplyReject hook code to include the Proxy-State and a Reply-Message (to match the above 'second' packet's contents), but the Reply-Message I send in the hook matches the 'Cisco-Abort-Cause' attribute. So the new packet looks like this:
>
> Packet length = 240
> Code:       Access-Reject
> Identifier: 218
> Authentic:  ...
> Attributes:
> 	Cisco-Abort-Cause = "Home organisation RADIUS server(s) not responding, so eduroam UK rejected the request"
> 	Proxy-State = ...
> 	Reply-Message = "Home organisation RADIUS server(s) not responding, so eduroam UK rejected the request"
> 	Message-Authenticator = ...
>
> This has now resolved the spurious 'No Proxy-State attribute found' messages on the downstream end, but then I still see the occasional duplicate like this (and the downstream sees the 'Unknown reply from...' message for the second packet):
>
> 1582046688.805439 Tue Feb 18 17:24:48 2020
> RADIUS message from <our server> port 2083 to <downstream> port 35322 ....
> Packet length = 240
> Code:       Access-Reject
> Identifier: 130
> Authentic:  <146>{Y<251><133>s Zi<140><22>@<218><21><198><172>
> Attributes:
>     Cisco-Abort-Cause = "Home organisation RADIUS server(s) not responding, so eduroam UK rejected the request"
>     Proxy-State = OSC-Extended-Id=4738
>     Reply-Message = "Home organisation RADIUS server(s) not responding, so eduroam UK rejected the request"
>     Message-Authenticator = <196><245>^8<231>m<17>.<202>4<196>o<160>:<156>r
>
> 1582046688.900040 Tue Feb 18 17:24:48 2020
> RADIUS message from <our server> port 2083 to <downstream> port 35322 ....
> Packet length = 169
> Code:       Access-Reject
> Identifier: 130
> Authentic:  <135><235>T<0>;<165>Klf<156><135>k<5>B<182><205>
> Attributes:
>     Cisco-Abort-Cause = "Home organisation RADIUS server(s) not responding, so eduroam UK rejected the request"
>     Reply-Message = "Request Denied"
>     Message-Authenticator = <255><163><7>QRS<26><181><245><156><251><11>S<8><202><251>
>     Proxy-State = OSC-Extended-Id=4738
>
> See? The Identifier is the same, the Proxy-State is the same, but the Reply-Message is different. This leads me to think that there's something happening here that shouldn't?
>
> Given that these things are spurious from this downstream source, but both them and us see this from other sources as well (other eduroam providers elsewhere) which also run Radiator, I wonder... 
>
> This information comes from a live server that handles hundreds of requests a second, so I don't know how we could run this in debug mode without a hit in performance.
>
> Any suggestions are welcome?
>
> With Kind Regards
>
> Stefan Paetow
> Federated Roaming Technical Specialist
>
> t: +44 (0)1235 822 125
> gpg: 0x3FCE5142
> xmpp: stefanp at jabber.dev.ja.net
> skype: stefan.paetow.janet
>
> jisc.ac.uk
>
> Jisc is a registered charity (number 1149740) and a company limited by guarantee which is registered in England under Company No. 5747339, VAT No. GB 197 0632 86. Jisc’s registered office is: One Castlepark, Tower Hill, Bristol, BS2 0JA. T 0203 697 5800.
>
>
>
> _______________________________________________
> radiator mailing list
> radiator at lists.open.com.au
> https://lists.open.com.au/mailman/listinfo/radiator

-- 
Stefan WINTER
Ingenieur de Recherche
Fondation RESTENA - Réseau Téléinformatique de l'Education Nationale et de la Recherche
2, avenue de l'Université
L-4365 Esch-sur-Alzette

Tel: +352 424409 1
Fax: +352 422473

PGP key updated to 4096 Bit RSA - I will encrypt all mails if the recipient's key is known to me

http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xC0DE6A358A39DC66


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.open.com.au/pipermail/radiator/attachments/20200220/f16adffe/attachment.sig>


More information about the radiator mailing list