<div dir="ltr">Hi Heikki, Wolfgang, <div><br></div><div>I would suggest that you raise a ticket at RedHat against OpenSSL about the memory leaks with exactly the analysis you did here because it proves that one of the patches in the source RPM introduces the memory leak. </div><div><br></div><div>:-)</div><div><br></div><div>Stefan</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 5 Apr 2022 at 18:49, Heikki Vatiainen <<a href="mailto:hvn@open.com.au">hvn@open.com.au</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 4.4.2022 21.14, Wolfgang Breyha wrote:<br>
<br>
> We (University of Vienna) recently noticed performance issues with our new<br>
> Radiator Servers running on RHEL 8.5. These were caused by radiusd (4.26)<br>
> itself eating memory until the machines started swaping and the IO-waits<br>
> raised until radiusd wasn't able to handle the load anymore.<br>
<br>
Hello Wolfgang, we can reproduce this with RHEL 8.5 and AlmaLinux and <br>
Rocky Linux too. With RHEL 9 beta the problem doesn't occur and memory <br>
usage is stable.<br>
<br>
> The same config was running on RHEL 6 with radiator 4.18 without any<br>
> problems or noticeable leaks for years.<br>
<br>
We used the same config on all of the above 4 versions. It's based on <br>
your configuration with more simplification. Could you try and see that <br>
you get the same result too. That is, the config below also uses memory <br>
on your test server.<br>
<br>
> I tried to run radiusd under valgrind with our configuration and assuming<br>
> that perl itself is not the cause I think the leak(s) happen(s) somewhere<br>
> down the Net::SSLeay->opennssl pipe. Since our config is rather complex I<br>
> tried to set up a simple EAP example config to see if it happens there as<br>
> well... and "luckily" it does. config is attached.<br>
<br>
I did some more expirements and found out the following:<br>
- Net::SSLeay compiled against RHEL supplied OpenSSL 1.1.1k doesn't fix <br>
leak. This is the same result as you wrote about<br>
- OpenSSL 1.1.1k compiled from source on RHEL 8 with Net::SSLeay 1.88 <br>
does not leak<br>
- OpenSSL 1.1.1k, Net::SSLeay 1.88 and Perl 5.26.3-threads-multi <br>
compiled on macOS 10.15.7 does not leak<br>
<br>
<br>
In other words, on RHEL8 OpenSSL 1.1.1k that comes with the OS leaks but <br>
locally compiled unpatched OpenSSL 1.1.1k does not leak.<br>
<br>
I then downloaded openssl-1.1.1k-5.el8_5.src.rpm from<br>
<a href="https://repo.almalinux.org/almalinux/8/BaseOS/Source/Packages/" rel="noreferrer" target="_blank">https://repo.almalinux.org/almalinux/8/BaseOS/Source/Packages/</a><br>
and did this to unpack it:<br>
<br>
rpm2cpio < openssl-1.1.1k-5.el8_5.src.rpm|cpio -i<br>
<br>
There are quite a few patches that are applied to RHEL and its derivatives.<br>
<br>
From the above it seems that a combination of what Radiator does, <br>
OpenSSL 1.1.1k and the patches applied to it trigger a leak.<br>
<br>
If you'd like to do more debugging, you can compile a local OpenSSL and <br>
Net::SSLeay like this:<br>
<br>
In OpenSSL directory, for example $HOME/src/p5-net-ssleay/ :<br>
./config --prefix=$HOME/opt/openssl-1.1.1k<br>
make<br>
make install_sw<br>
make install_ssldirs<br>
<br>
The setup environment and compile Net::SSLeay. In Net::SSleay directory:<br>
export LD_LIBRARY_PATH=$HOME/opt/openssl-1.1.1k/lib<br>
OPENSSL_PREFIX=$HOME/opt/openssl-1.1.1k perl Makefile.PL<br>
make<br>
make test # Check the locations in the test output<br>
<br>
You can then run Radiator from command line directly if you run it like <br>
this:<br>
perl -I $HOME/src/p5-net-ssleay/blib/lib/ -I <br>
$HOME/src/p5-net-ssleay/blib/arch /opt/radiator//radiator/radiusd <br>
-dictionary /opt/radiator/radiator/dictionary -foreground -log_stdout <br>
-trace 4 -config leak-test.conf<br>
<br>
> I then start eapol_test (from wpa_supplicant RPM) with a config of<br>
> network={<br>
> eap=PEAP<br>
> eapol_flags=0<br>
> key_mgmt=IEEE8021X<br>
> identity="testuser"<br>
> anonymous_identity="anonymous"<br>
> password="testpass"<br>
> ca_cert="/etc/pki/tls/cert.pem"<br>
> phase2="auth=MSCHAPV2"<br>
> }<br>
> in a loop and can watch radiusd eating memory.<br>
<br>
I used exactly the same config with my testing. I even used eapol_test <br>
that comes with 'yum install wpa_supplicant', but I don't think <br>
eapol_test version matters.<br>
<br>
Radiator's shortened config we used was this. It was run from command <br>
line and certifcates are the demo certs that come with Radiator in <br>
/opt/radiator/radiator/certificates directory.<br>
<br>
Trace 4<br>
LogTraceId<br>
LogMicroseconds<br>
DbDir .<br>
LogDir .<br>
LogFile %L/radius.log<br>
DictionaryFile /opt/radiator/radiator/dictionary<br>
AuthPort 1645,1812<br>
AcctPort 1646,1813<br>
<br>
<Client DEFAULT><br>
Secret mysecret<br>
</Client><br>
<br>
<AuthBy FILE><br>
Identifier AuthTEST<br>
Filename %D/users<br>
EAPType PEAP,MSCHAP-V2<br>
EAPTLS_CAFile %D/demoCA/cacert.pem<br>
EAPTLS_CertificateFile %D/cert-srv.pem<br>
EAPTLS_PrivateKeyFile %D/key.pem<br>
EAPTLS_CertificateType PEM<br>
EAPTLS_MaxFragmentSize 1000<br>
EAPTLS_SessionResumption 0<br>
AutoMPPEKeys<br>
</AuthBy><br>
<br>
<Handler TunnelledByPEAP=1><br>
AuthBy AuthTEST<br>
</Handler><br>
<br>
<Handler><br>
AuthBy AuthTEST<br>
</Handler><br>
<br>
> I started it in batches of 1000 and the RSS increased from fresh start...<br>
> 36884->115592->124648->132256->135120->139868->139868->143116->147152<br>
<br>
The valgrind results below are interesting. I could not immediately find <br>
anything from OpenSSL patches the distributions use that touch these <br>
functions. Well, apart from FIPS changes that change many parts of the code.<br>
<br>
Do you think you could also try against a locally compiled OpenSSL? It <br>
seems that distribution patches affect the leak, but I'd like to get a <br>
confirmation for this.<br>
<br>
Thanks again for the detailed report. If you think you could do more <br>
work with valgrind, I'd interested to see the results.<br>
<br>
<br>
> If I run this config with valgrind again I find a similar "definitely lost<br>
> memory" section with a close amount of requests as with our full config:<br>
> ==1420461== 233,728 bytes in 913 blocks are definitely lost in loss record<br>
> 6,062 of 6,088<br>
> ==1420461== at 0x4C360A5: malloc (vg_replace_malloc.c:380)<br>
> ==1420461== by 0xA39690C: CRYPTO_zalloc (in /usr/lib64/libcrypto.so.1.1.1k)<br>
> ==1420461== by 0xA382AC3: EVP_PKEY_meth_new (in<br>
> /usr/lib64/libcrypto.so.1.1.1k)<br>
> ==1420461== by 0xCF3CAD7: ??? (in /usr/lib64/engines-1.1/pkcs11.so)<br>
> ==1420461== by 0xA3648E4: ENGINE_get_pkey_meth (in<br>
> /usr/lib64/libcrypto.so.1.1.1k)<br>
> ==1420461== by 0xA382EA4: ??? (in /usr/lib64/libcrypto.so.1.1.1k)<br>
> ==1420461== by 0xA37E543: ??? (in /usr/lib64/libcrypto.so.1.1.1k)<br>
> ==1420461== by 0x9FD5A41: ??? (in /usr/lib64/libssl.so.1.1.1k)<br>
> ==1420461== by 0x9FC833E: ??? (in /usr/lib64/libssl.so.1.1.1k)<br>
> ==1420461== by 0x9FB3C97: SSL_do_handshake (in /usr/lib64/libssl.so.1.1.1k)<br>
> ==1420461== by 0x9D3ACA3: ??? (in<br>
> /usr/lib64/perl5/vendor_perl/auto/Net/SSLeay/SSLeay.so)<br>
> ==1420461== by 0x4F2F4B8: Perl_pp_entersub (in /usr/lib64/libperl.so.5.26.3)<br>
> <br>
> and a "possibly lost memory"<br>
> ==1420461== 640,000 bytes in 1,000 blocks are possibly lost in loss record<br>
> 6,079 of 6,088<br>
> ==1420461== at 0x4C360A5: malloc (vg_replace_malloc.c:380)<br>
> ==1420461== by 0xA39690C: CRYPTO_zalloc (in /usr/lib64/libcrypto.so.1.1.1k)<br>
> ==1420461== by 0x9FBA5AC: SSL_SESSION_new (in /usr/lib64/libssl.so.1.1.1k)<br>
> ==1420461== by 0x9FBAE06: ??? (in /usr/lib64/libssl.so.1.1.1k)<br>
> ==1420461== by 0x9FD9E78: ??? (in /usr/lib64/libssl.so.1.1.1k)<br>
> ==1420461== by 0x9FC855A: ??? (in /usr/lib64/libssl.so.1.1.1k)<br>
> ==1420461== by 0x9FB3C97: SSL_do_handshake (in /usr/lib64/libssl.so.1.1.1k)<br>
> ==1420461== by 0x9D3ACA3: ??? (in<br>
> /usr/lib64/perl5/vendor_perl/auto/Net/SSLeay/SSLeay.so)<br>
> ==1420461== by 0x4F2F4B8: Perl_pp_entersub (in /usr/lib64/libperl.so.5.26.3)<br>
> ==1420461== by 0x4F27324: Perl_runops_standard (in<br>
> /usr/lib64/libperl.so.5.26.3)<br>
> ==1420461== by 0x4EA6FFC: perl_run (in /usr/lib64/libperl.so.5.26.3)<br>
> ==1420461== by 0x108ED9: ??? (in /usr/bin/perl)<br>
> <br>
> The machines run a fully patched RHEL 8.5 with the current<br>
> radiator-4.26-1.el8.noarch from your website<br>
> openssl-libs-1.1.1k-5.el8_5.x86_64<br>
> perl-Net-SSLeay-1.88-1.module+el8.3.0+6446+594cad75.x86_64<br>
> <br>
> I also tried to build a new Net::SSLeay-1.92. Same results.<br>
> <br>
> If we can't find the cause it seems we need to restart radiator periodically.<br>
<br>
Hopefully there's another solution. It might be that the OS (RHEL, Alma, <br>
Rocky) patches, and how they affect Radiator, need to be checked at some <br>
point.<br>
<br>
Thanks,<br>
Heikki<br>
<br>
-- <br>
Heikki Vatiainen<br>
OSC, makers of Radiator<br>
Visit <a href="http://radiatorsoftware.com" rel="noreferrer" target="_blank">radiatorsoftware.com</a> for Radiator AAA server software<br>
_______________________________________________<br>
radiator mailing list<br>
<a href="mailto:radiator@lists.open.com.au" target="_blank">radiator@lists.open.com.au</a><br>
<a href="https://lists.open.com.au/mailman/listinfo/radiator" rel="noreferrer" target="_blank">https://lists.open.com.au/mailman/listinfo/radiator</a><br>
</blockquote></div>