nslcd errors talking to IPVS cluster of LDAP servers
[
Date Prev][
Date Next]
[
Thread Prev][
Thread Next]
nslcd errors talking to IPVS cluster of LDAP servers
- From: Ken Gaillot <kjgaillo [at] gleim.com>
- To: nss-pam-ldapd-users [at] lists.arthurdejong.org
- Subject: nslcd errors talking to IPVS cluster of LDAP servers
- Date: Thu, 07 Oct 2010 11:02:00 -0400
Hi,
Our shop runs a bunch of Debian lenny servers, some with LDAP-based
shell access using the libnss-ldap package. We decided to give
libnss-ldapd a try on a new server. We ran into problems with our LDAP
setup.
We have three LDAP servers that are hidden behind an
IPVS/ldirectord/heartbeat cluster (for load-balancing and simpler client
configuration). So the cluster presents a single IP address, and LDAP
requests to it are handed off transparently to one of the real servers.
The first symptom we noticed is that our nightly osiris scan of the
system would sometimes report that all of the LDAP user accounts were
missing (and re-added a later night).
We traced that issue to log messages like these:
Oct 3 12:42:51 adonis nslcd[1517]: [cdfac0] ldap_result() failed: Can't
contact LDAP server
Oct 3 12:42:51 adonis nslcd[1517]: [cdfac0] ldap_abandon() failed to
abandon search: Other (e.g., implementation specific) error
Oct 3 12:42:52 adonis nslcd[1517]: [cdfac0] connected to LDAP server
ldap://ldap.teamgleim.com
Oct 3 13:30:20 adonis nslcd[1517]: [578454] ldap_search_ext() failed:
Can't contact LDAP server
Oct 3 13:30:20 adonis nslcd[1517]: [578454] no available LDAP server
found, sleeping 1 seconds
Oct 3 13:30:21 adonis nslcd[1517]: [578454] no available LDAP server found
Oct 3 13:30:21 adonis nslcd[1517]: [578454] no available LDAP server
found, sleeping 29 seconds
It would eventually reconnect, but I'm guessing osiris had already timed
out waiting for a response and considered the user accounts to be missing.
I tried several things:
* Setting an idle_timeout of 280 did not clear the errors.
* Restarting nslcd would clear the errors for more than an hour, but
then they would start again.
* Having a cron job run "getent passwd" every four minutes (thus
preventing nslcd from losing its connection to the LDAP server) *did*
clear the errors.
* Finally, changing the nslcd LDAP URI from the cluster address to an
explicit list of the three real LDAP servers *did* clear the errors.
Not being an expert in the code, I can only guess that nslcd has
problems if it tries "reconnecting" to an LDAP server and actually gets
connected to a different server -- some sort of state information about
the previous connection must be maintained somewhere.
For now, we'll probably stick with libnss-ldap since we're familiar with
it, but I wanted to mention the issue in case there's something simple
I'm missing.
-- Ken Gaillot <kjgaillo@gleim.com>
Network Operations Center, Gleim Publications
--
To unsubscribe send an email to
nss-pam-ldapd-users-unsubscribe@lists.arthurdejong.org or see
http://lists.arthurdejong.org/nss-pam-ldapd-users
- nslcd errors talking to IPVS cluster of LDAP servers,
Ken Gaillot