lists.arthurdejong.org
RSS feed

nslcd errors talking to IPVS cluster of LDAP servers

[Date Prev][Date Next] [Thread Prev][Thread Next]

nslcd errors talking to IPVS cluster of LDAP servers



Hi,

Our shop runs a bunch of Debian lenny servers, some with LDAP-based shell access using the libnss-ldap package. We decided to give libnss-ldapd a try on a new server. We ran into problems with our LDAP setup.

We have three LDAP servers that are hidden behind an IPVS/ldirectord/heartbeat cluster (for load-balancing and simpler client configuration). So the cluster presents a single IP address, and LDAP requests to it are handed off transparently to one of the real servers.

The first symptom we noticed is that our nightly osiris scan of the system would sometimes report that all of the LDAP user accounts were missing (and re-added a later night).

We traced that issue to log messages like these:

Oct 3 12:42:51 adonis nslcd[1517]: [cdfac0] ldap_result() failed: Can't contact LDAP server Oct 3 12:42:51 adonis nslcd[1517]: [cdfac0] ldap_abandon() failed to abandon search: Other (e.g., implementation specific) error Oct 3 12:42:52 adonis nslcd[1517]: [cdfac0] connected to LDAP server ldap://ldap.teamgleim.com Oct 3 13:30:20 adonis nslcd[1517]: [578454] ldap_search_ext() failed: Can't contact LDAP server Oct 3 13:30:20 adonis nslcd[1517]: [578454] no available LDAP server found, sleeping 1 seconds
Oct  3 13:30:21 adonis nslcd[1517]: [578454] no available LDAP server found
Oct 3 13:30:21 adonis nslcd[1517]: [578454] no available LDAP server found, sleeping 29 seconds

It would eventually reconnect, but I'm guessing osiris had already timed out waiting for a response and considered the user accounts to be missing.

I tried several things:

* Setting an idle_timeout of 280 did not clear the errors.

* Restarting nslcd would clear the errors for more than an hour, but then they would start again.

* Having a cron job run "getent passwd" every four minutes (thus preventing nslcd from losing its connection to the LDAP server) *did* clear the errors.

* Finally, changing the nslcd LDAP URI from the cluster address to an explicit list of the three real LDAP servers *did* clear the errors.

Not being an expert in the code, I can only guess that nslcd has problems if it tries "reconnecting" to an LDAP server and actually gets connected to a different server -- some sort of state information about the previous connection must be maintained somewhere.

For now, we'll probably stick with libnss-ldap since we're familiar with it, but I wanted to mention the issue in case there's something simple I'm missing.

-- Ken Gaillot <kjgaillo@gleim.com>
Network Operations Center, Gleim Publications
--
To unsubscribe send an email to
nss-pam-ldapd-users-unsubscribe@lists.arthurdejong.org or see
http://lists.arthurdejong.org/nss-pam-ldapd-users