lists.arthurdejong.org
RSS feed

Re: [nssldap] nss-ldap timeouts when used with nscd and gnutls

[Date Prev][Date Next] [Thread Prev][Thread Next]

Re: [nssldap] nss-ldap timeouts when used with nscd and gnutls



Douglas E. Engert wrote:
We have seen a number of issues with nss-ldap when going from
Ubuntu Dapper to Ubuntu Hardy. (Intrepid has shown similiar problems.)
Dapper clients and Solaris 9 and 10 using Sun's nss ldap work
fine with our two ldap servers.

Hardy, based on nss-ldap_258, has the problems. The code for 260
and 264 appears to have the same problems.

Your analysis makes sense to me. But at the moment I'm no longer interested in nss-ldap since nss-ldapd ( + slapd nssov) works better and offers easier administration.

First problem:

The /etc/ldap.conf file implies the default for timeout is 30 seconds.
But it is unlimited in the code. This has caused nscd to lockup as it
keeps accepting requests, with all its worker threads waiting on the
nss-ldap lock, with one thread waiting in ldap_result waiting for the
response. netstat -a shows the connection is in CLOSE_WAIT.  The systems
keep running slow as each caller of nscd times out waithing the nscd,
then goes ahead and does the LDAP request. Nscd uses on file descriptor
for each request and eventually runs out of file descriptors and start
using 100% CPU.

Setting timeout 30 at least helps get out of this situation.
Suggestion: in util.c: set result->ldc_timelimit = 30;  (See attachment)

Second problem:

In ldap-nss.c if the do_result gets a timeout (or error), it writes to
syslog: "nss_ldap: could not get LDAP result" and  sets stat = NSS_UNAVAIL;

But the __session.ls_state is still set to LS_CONNECTED_TO_DSA
and the next operation tries to use the same connection which will also
time out.

Suggestion: in ldap-nss.c (see attachment)
Add call to do_close() in two places where do_result gets a timeout or
other connection error. This change will causes the next request to
reconnect. It may take 30 seconds, but the new connection will not timeout
again.


These problems may be related to the Ubuntu conversion from using OpenSSL
to using GunTLS. It may be that OpenSSL or GnuTLS fails to shutdown the
connectioncorrectly, or fails to tell ldap_search that the connection is
down.

In any case if the do_result fails with some timeout or connection problem,
the conservative thing to do is to do through the do_with_reconnect and try
a different server.

Has anyone seen any similar problems?

What we are testing now is using the Intrepid version of nss-ldap based on
260 on Hardy with the attached changes.

Packages being used:
       libnss-ldap     260-1ubuntu2-dee1   (-dee1 has my changes)
       libldap-2.4-2   2.4.9-0ubuntu0.8.04.2
       libgnutls13     2.0.4-1ubuntu2.3
       nscd            2.7-10ubuntu4



--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/