lists.arthurdejong.org
RSS feed

Re: [nssldap] nss-ldap timeouts when used with nscd and gnutls

[Date Prev][Date Next] [Thread Prev][Thread Next]

Re: [nssldap] nss-ldap timeouts when used with nscd and gnutls





Howard Wilkinson wrote:
Douglas E. Engert wrote:


Arthur de Jong wrote:
On Tue, 2009-04-21 at 15:22 -0500, Douglas E. Engert wrote:
Your analysis makes sense to me. But at the moment I'm no longer interested in nss-ldap since nss-ldapd ( + slapd nssov) works better
and offers easier administration.
Sounds interesting, but we are trying to stick with what is offered by
Ubuntu.

FWIW some releases of Ubuntu have nss-ldapd (libnss-ldapd) but I would
avoid version 0.5. The 0.6.7 release is known to work quite well and is
included in Debian stable. There is however no packaged version of the
nssov in slapd as far as I know (but you can use nss-ldapd without it).

Thanks, we will have to look at that.

I did see in the archives that Howard Wilkinson on Dec 9, 2008
"Mega patch against nss_ldap 264" said:

"My intention with this is to make the critical path through the code run
 the minimal code when a connection to the LDAP server exists, make
 recovery on failure more resilient, and provide for multiple SASL mechs
 without need to alter the ldap-nss code."

Yes I said this but I have yet to finish this piece of code. What I have done runs better than it did before but it does not address some of the stability issues I found.

You will need to apply the patch and see how you get on. I am hoping to find time next month to revisit this, but as I am having trouble finding paying work (as most of the UK seems to be) this may slip if somebody finds something else for me to do.


OK, I was not sure where this major modification stood.

The major piece of work that is needed, apart from fixing my patch to be style compatible with the rest of nss_ldap, is to remove some recursion from the code that breaks if the underlying connection to the LDAP disconnects. This needs to be replaced with a list walking operation so that the reconnects can recover and continue if the remote server has gone away. I forget which piece of code this is, but I think it was in the groups generation operation.

Your change may address the two bugs I turned into today, #391 and #392.
If so that would be great. I was hopping to get #392 into the code upstream,
of Debian and Ubuntu so they would pick them up. The #392 change is really
adding two calls to do_close(), if a connection has an error or times out.

This is not a perfect fix as the active request may still fail. But what
we see is nscd stops working, but the caller like sshd, cron, ls, etc. will
detect that nscd is not working and do calls to LDAP directly bypassing nscd.
So nothing appears to fail, but an ls can take 15 seconds, or a login 30
seconds more then expected.

If it handles the cases where do_result fails, and timeout and connection
errors reconnect to any server that may fix the issue I have seen.


Since we're working hard on a PAM module (actually Howard Chu is doing
all the hard work at the moment) as a side effect we may also make it
more easily possible to use the nss-ldapd NSS module together with a
packaged slapd-nssov package (if such a package would be made).

(it's a bit awkward to post a more or less nss-ldapd promotional message
on the nss_ldap list)


I had intended to get the nss_Ldap work finished and then look at porting the functionality into the nss-ldapd environment. But again time has not been on my side.

If I can help then please feel free to ping me.

Howard.




--

 Douglas E. Engert  <DEEngert@anl.gov>
 Argonne National Laboratory
 9700 South Cass Avenue
 Argonne, Illinois  60439
 (630) 252-5444