lists.arthurdejong.org
RSS feed

[nssldap] Multiple LDAP servers, single URI, server shutting down, hangs or fails!

[Date Prev][Date Next] [Thread Prev][Thread Next]

[nssldap] Multiple LDAP servers, single URI, server shutting down, hangs or fails!



We have what I think of as a 'standard' mixed environment set up and everything works under normal operation BUT when one of our LDAP servers is shutting down we get failures. I think this is a short coming in the openldap library's handling of the 'uri' settings but would like some more info and wondered if anybody can shed some more light on this. I have traced through the library code to the 'ldap_connect_to_host' routine in the os-ip.c file in the openldap library and think this is where the problem arises but have no direct evidence.

Our set-up is as follows. We use Active Directory as our LDAP/KDC supplier (these are Win2K3R2 boxes but I have seen this with other flavours). On this particular environment we have 2 servers both fairly lightly loaded most of the time. However, one of these server runs Exchange 2000 and when shutting down can take up to 25 minutes to get to the point where the network interface stops responding to pings.

The Unix side is configured with nss_ldap (264 + my kerberos patches) and uses kerberos sasl connections to the LDAP service under AD.

The system is also configured to use pam_krb5 as the authenticator which may amplify the problem as the KDC seems to shut down before the LDAP service.

The ldap.conf file contains a single 'uri' statement which looks like this.

uri ldap://active-directory-domain

The look up of the domain will give multiple addresses in our case 192.168.10.1 and 192.168.10.3! (The second is our Exchange Server)

While the exchange server machine is shutting down we get login failures (pam_krb5 reports incorrect password) and 'getent password' does not report user entries.

We run NSCD on our boxes just to complicate matters.

It looks to me like the LDAP code will connect to the LDAP server on the machine that is closing down but as it cannot get service it reports a failure which results in the upper level code not listing the users. That is the socket is still accepting connections but the LDAP server has already died on the Active Directory box ... this is potentially a Microsoft bug, but we should be working around this as a partially crash server would give the same results elsewhere.

Now I could use a url with multiple host names and it looks like this might work a bit better, as the code seems to have a mechanism to iterate through the hosts. But I was wondering if this should be fixed in the OpenLDAP library especially as listing the Domain Name allows us to add and remove AD servers dynamically and the DNS provides the lookup.

As an alternative or an addition should we be handling the sites and services information in the DNS and binding via SRV lookups? Again is this a job for the OpenLDAP library or should nss_ldap handle this.

I am struggling to work out which mailing list in the OpenLDAP fora would be appropriate to try to discuss this and was hoping somebody here could also point me down that path.

Regards, Howard.