[RADIATOR] Memory Leak on RHEL 8.5

Heikki Vatiainen hvn at open.com.au
Tue Apr 5 17:44:59 UTC 2022


On 4.4.2022 21.14, Wolfgang Breyha wrote:

> We (University of Vienna) recently noticed performance issues with our new
> Radiator Servers running on RHEL 8.5. These were caused by radiusd (4.26)
> itself eating memory until the machines started swaping and the IO-waits
> raised until radiusd wasn't able to handle the load anymore.

Hello Wolfgang, we can reproduce this with RHEL 8.5 and AlmaLinux and 
Rocky Linux too. With RHEL 9 beta the problem doesn't occur and memory 
usage is stable.

> The same config was running on RHEL 6 with radiator 4.18 without any
> problems or noticeable leaks for years.

We used the same config on all of the above 4 versions. It's based on 
your configuration with more simplification. Could you try and see that 
you get the same result too. That is, the config below also uses memory 
on your test server.

> I tried to run radiusd under valgrind with our configuration and assuming
> that perl itself is not the cause I think the leak(s) happen(s) somewhere
> down the Net::SSLeay->opennssl pipe. Since our config is rather complex I
> tried to set up a simple EAP example config to see if it happens there as
> well... and "luckily" it does. config is attached.

I did some more expirements and found out the following:
- Net::SSLeay compiled against RHEL supplied OpenSSL 1.1.1k doesn't fix 
leak. This is the same result as you wrote about
- OpenSSL 1.1.1k compiled from source on RHEL 8 with Net::SSLeay 1.88 
does not leak
- OpenSSL 1.1.1k, Net::SSLeay 1.88 and Perl 5.26.3-threads-multi 
compiled on macOS 10.15.7 does not leak


In other words, on RHEL8 OpenSSL 1.1.1k that comes with the OS leaks but 
locally compiled unpatched OpenSSL 1.1.1k does not leak.

I then downloaded openssl-1.1.1k-5.el8_5.src.rpm from
https://repo.almalinux.org/almalinux/8/BaseOS/Source/Packages/
and did this to unpack it:

rpm2cpio < openssl-1.1.1k-5.el8_5.src.rpm|cpio -i

There are quite a few patches that are applied to RHEL and its derivatives.

 From the above it seems that a combination of what Radiator does, 
OpenSSL 1.1.1k and the patches applied to it trigger a leak.

If you'd like to do more debugging, you can compile a local OpenSSL and 
Net::SSLeay like this:

In OpenSSL directory, for example $HOME/src/p5-net-ssleay/ :
./config --prefix=$HOME/opt/openssl-1.1.1k
make
make install_sw
make install_ssldirs

The setup environment and compile Net::SSLeay. In Net::SSleay directory:
export LD_LIBRARY_PATH=$HOME/opt/openssl-1.1.1k/lib
OPENSSL_PREFIX=$HOME/opt/openssl-1.1.1k perl Makefile.PL
make
make test # Check the locations in the test output

You can then run Radiator from command line directly if you run it like 
this:
perl -I $HOME/src/p5-net-ssleay/blib/lib/ -I 
$HOME/src/p5-net-ssleay/blib/arch /opt/radiator//radiator/radiusd 
-dictionary /opt/radiator/radiator/dictionary -foreground -log_stdout 
-trace 4 -config leak-test.conf

> I then start eapol_test (from wpa_supplicant RPM) with a config of
> network={
> eap=PEAP
> eapol_flags=0
> key_mgmt=IEEE8021X
> identity="testuser"
> anonymous_identity="anonymous"
> password="testpass"
> ca_cert="/etc/pki/tls/cert.pem"
> phase2="auth=MSCHAPV2"
> }
> in a loop and can watch radiusd eating memory.

I used exactly the same config with my testing. I even used eapol_test 
that comes with 'yum install wpa_supplicant', but I don't think 
eapol_test version matters.

Radiator's shortened config we used was this. It was run from command 
line and certifcates are the demo certs that come with Radiator in 
/opt/radiator/radiator/certificates directory.

Trace 4
LogTraceId
LogMicroseconds
DbDir .
LogDir .
LogFile %L/radius.log
DictionaryFile /opt/radiator/radiator/dictionary
AuthPort 1645,1812
AcctPort 1646,1813

<Client DEFAULT>
         Secret  mysecret
</Client>

<AuthBy FILE>
     Identifier AuthTEST
     Filename %D/users
     EAPType PEAP,MSCHAP-V2
     EAPTLS_CAFile %D/demoCA/cacert.pem
     EAPTLS_CertificateFile %D/cert-srv.pem
     EAPTLS_PrivateKeyFile %D/key.pem
     EAPTLS_CertificateType PEM
     EAPTLS_MaxFragmentSize 1000
     EAPTLS_SessionResumption 0
     AutoMPPEKeys
</AuthBy>

<Handler TunnelledByPEAP=1>
     AuthBy AuthTEST
</Handler>

<Handler>
     AuthBy AuthTEST
</Handler>

> I started it in batches of 1000 and the RSS increased from fresh start...
> 36884->115592->124648->132256->135120->139868->139868->143116->147152

The valgrind results below are interesting. I could not immediately find 
anything from OpenSSL patches the distributions use that touch these 
functions. Well, apart from FIPS changes that change many parts of the code.

Do you think you could also try against a locally compiled OpenSSL? It 
seems that distribution patches affect the leak, but I'd like to get a 
confirmation for this.

Thanks again for the detailed report. If you think you could do more 
work with valgrind, I'd interested to see the results.


> If I run this config with valgrind again I find a similar "definitely lost
> memory" section with a close amount of requests as with our full config:
> ==1420461== 233,728 bytes in 913 blocks are definitely lost in loss record
> 6,062 of 6,088
> ==1420461==    at 0x4C360A5: malloc (vg_replace_malloc.c:380)
> ==1420461==    by 0xA39690C: CRYPTO_zalloc (in /usr/lib64/libcrypto.so.1.1.1k)
> ==1420461==    by 0xA382AC3: EVP_PKEY_meth_new (in
> /usr/lib64/libcrypto.so.1.1.1k)
> ==1420461==    by 0xCF3CAD7: ??? (in /usr/lib64/engines-1.1/pkcs11.so)
> ==1420461==    by 0xA3648E4: ENGINE_get_pkey_meth (in
> /usr/lib64/libcrypto.so.1.1.1k)
> ==1420461==    by 0xA382EA4: ??? (in /usr/lib64/libcrypto.so.1.1.1k)
> ==1420461==    by 0xA37E543: ??? (in /usr/lib64/libcrypto.so.1.1.1k)
> ==1420461==    by 0x9FD5A41: ??? (in /usr/lib64/libssl.so.1.1.1k)
> ==1420461==    by 0x9FC833E: ??? (in /usr/lib64/libssl.so.1.1.1k)
> ==1420461==    by 0x9FB3C97: SSL_do_handshake (in /usr/lib64/libssl.so.1.1.1k)
> ==1420461==    by 0x9D3ACA3: ??? (in
> /usr/lib64/perl5/vendor_perl/auto/Net/SSLeay/SSLeay.so)
> ==1420461==    by 0x4F2F4B8: Perl_pp_entersub (in /usr/lib64/libperl.so.5.26.3)
> 
> and a "possibly lost memory"
> ==1420461== 640,000 bytes in 1,000 blocks are possibly lost in loss record
> 6,079 of 6,088
> ==1420461==    at 0x4C360A5: malloc (vg_replace_malloc.c:380)
> ==1420461==    by 0xA39690C: CRYPTO_zalloc (in /usr/lib64/libcrypto.so.1.1.1k)
> ==1420461==    by 0x9FBA5AC: SSL_SESSION_new (in /usr/lib64/libssl.so.1.1.1k)
> ==1420461==    by 0x9FBAE06: ??? (in /usr/lib64/libssl.so.1.1.1k)
> ==1420461==    by 0x9FD9E78: ??? (in /usr/lib64/libssl.so.1.1.1k)
> ==1420461==    by 0x9FC855A: ??? (in /usr/lib64/libssl.so.1.1.1k)
> ==1420461==    by 0x9FB3C97: SSL_do_handshake (in /usr/lib64/libssl.so.1.1.1k)
> ==1420461==    by 0x9D3ACA3: ??? (in
> /usr/lib64/perl5/vendor_perl/auto/Net/SSLeay/SSLeay.so)
> ==1420461==    by 0x4F2F4B8: Perl_pp_entersub (in /usr/lib64/libperl.so.5.26.3)
> ==1420461==    by 0x4F27324: Perl_runops_standard (in
> /usr/lib64/libperl.so.5.26.3)
> ==1420461==    by 0x4EA6FFC: perl_run (in /usr/lib64/libperl.so.5.26.3)
> ==1420461==    by 0x108ED9: ??? (in /usr/bin/perl)
> 
> The machines run a fully patched RHEL 8.5 with the current
> radiator-4.26-1.el8.noarch from your website
> openssl-libs-1.1.1k-5.el8_5.x86_64
> perl-Net-SSLeay-1.88-1.module+el8.3.0+6446+594cad75.x86_64
> 
> I also tried to build a new Net::SSLeay-1.92. Same results.
> 
> If we can't find the cause it seems we need to restart radiator periodically.

Hopefully there's another solution. It might be that the OS (RHEL, Alma, 
Rocky) patches, and how they affect Radiator, need to be checked at some 
point.

Thanks,
Heikki

-- 
Heikki Vatiainen
OSC, makers of Radiator
Visit radiatorsoftware.com for Radiator AAA server software


More information about the radiator mailing list