[RADIATOR] AuthHEIMDALDIGEST and defunct kdigest processes

Johan Wassberg jocar at su.se
Thu Jan 31 13:10:48 UTC 2019


Hi!

We are using Radiator with AuthHEIMDALDIGEST and recently upgraded from 4.15 to
4.22. We have noticed that 4.22 is leaving a lot of defunct `kdigest` processes
which over time is causing Radiator to crash due to trouble forking new
`kdigest`s.

Found that the AuthHEIMDALDIGEST has been modified between the versions
and the new version runs `waitpid` on the child process. Our guess is
that there is a race-condition between the file descriptors closing
and the child returning `SIGCHLD` which make `waitpid` have no effect at
all.

To test this we modified AuthHEIMDALDIGEST.pm:
```
--- AuthHEIMDALDIGEST.pm     2019-01-31 10:51:41.860109152 +0100
+++ AuthHEIMDALDIGEST.pm.mod    2019-01-31 10:51:54.949232773 +0100
@@ -239,7 +239,10 @@
        $self->log($main::LOG_ERR, "AuthHEIMDALDIGEST Unexpected output from kdigest: $output.", $p);
     }
     close($read); close($write);
-    waitpid($child_pid, WNOHANG); # Reap it
+    my $return = waitpid($child_pid, WNOHANG); # Reap it
+    if ($return <= 0) {
+       $self->log($main::LOG_ERR, "AuthHEIMDALDIGEST kdigest_digest waitpid missed child ($child_pid), waitpid returned $return");
+    }

     return 1 if ($status && $status eq "ok");
     return 0;
@@ -294,7 +297,10 @@
        $self->log($main::LOG_ERR, "AuthHEIMDALDIGEST Unexpected output from kdigest: $output.", $p);
     }
     close($read); close($write);
-    waitpid($child_pid, WNOHANG); # Reap it
+    my $return = waitpid($child_pid, WNOHANG); # Reap it
+    if ($return <= 0) {
+       $self->log($main::LOG_ERR, "AuthHEIMDALDIGEST kdigest_challenge waitpid missed child ($child_pid), waitpid returned $return") ;
+    }

     unless ($context->{kdigest_challenge} && $context->{kdigest_opaque})
     {
```

That gives the following debug output:
```
Thu Jan 31 09:39:36 2019: DEBUG: Handling request with Handler 'NAS-Identifier=/Example/, RecvFromAddress=/^127.0.0.1/', Identifier ''
Thu Jan 31 09:39:36 2019: DEBUG: SessINTERNAL: Deleting session for jocar, 127.0.0.1,
Thu Jan 31 09:39:36 2019: DEBUG: Handling with Radius::AuthHEIMDALDIGEST: InnerEAP
Thu Jan 31 09:39:36 2019: DEBUG: Radius::AuthHEIMDALDIGEST looks for match with jocar [jocar]
Thu Jan 31 09:39:36 2019: DEBUG: AuthHEIMDALDIGEST challenge command: /usr/sbin/kdigest digest-server-init --type=CHAP --kerberos-realm=EXAMPLE.COM
Thu Jan 31 09:39:36 2019: DEBUG: AuthHEIMDALDIGEST challenge command output: type=CHAP
Thu Jan 31 09:39:36 2019: DEBUG: AuthHEIMDALDIGEST challenge command output: server-nonce=4e62aecb9f98579ad93b4ea84223b9d6
Thu Jan 31 09:39:36 2019: DEBUG: AuthHEIMDALDIGEST challenge command output: identifier=3A
Thu Jan 31 09:39:36 2019: DEBUG: AuthHEIMDALDIGEST challenge command output: opaque=4e62aecb9f98579ad93b4ea84223b9d64e62aecb9f98579ad93b4ea84223b9d6

Thu Jan 31 09:39:36 2019: ERR: AuthHEIMDALDIGEST kdigest_challenge waitpid missed child (22311), waitpid returned 0

Thu Jan 31 09:39:36 2019: DEBUG: AuthHEIMDALDIGEST digest command: /usr/sbin/kdigest digest-server-request --type=CHAP --username=jocar --opaque=4e62aecb9f98579ad93b4ea84223b9d64e62aecb9f98579ad93b4ea84223b9d6 --server-identifier=01 --server-nonce=4e62aecb9f98579ad93b4ea84223b9d6 --client-response=4e62aecb9f98579ad93b4ea84223b9d6 --kerberos-realm=EXAMPLE.COM
Thu Jan 31 09:39:36 2019: DEBUG: AuthHEIMDALDIGEST digest command output: status=failed
Thu Jan 31 09:39:36 2019: DEBUG: AuthHEIMDALDIGEST digest command output: tickets=no

Thu Jan 31 09:39:36 2019: ERR: AuthHEIMDALDIGEST kdigest_digest waitpid missed child (22312), waitpid returned 0

Thu Jan 31 09:39:36 2019: DEBUG: Radius::AuthHEIMDALDIGEST REJECT: AuthBy HEIMDALDIGEST Password check failed: jocar [jocar]
Thu Jan 31 09:39:36 2019: DEBUG: AuthBy HEIMDALDIGEST result: REJECT, AuthBy HEIMDALDIGEST Password check failed
Thu Jan 31 09:39:36 2019: INFO: Access rejected for jocar: AuthBy HEIMDALDIGEST Password check failed
```

One solution could be to run `waitpid` with "-1" in a loop instead of
the real pid and therfor handling all childs that sent `SIGCHLD` in the
next authentication. Another soution could be to remove `WNOHANG` making
`waitpid` block until the child returns. Not sure if that has any
performance issues or why you implemented `waitpid` with `WNOHANG` from
the beginning.

Let me know if there is anything else I can provide to easier resolve
this issue.

-- 
jocar



More information about the radiator mailing list