lists.arthurdejong.org
RSS feed

RE: [nssldap] RE: nss_ldap: reconnected to LDAP server ldap://localhost after 1 attempt

[Date Prev][Date Next] [Thread Prev][Thread Next]

RE: [nssldap] RE: nss_ldap: reconnected to LDAP server ldap://localhost after 1 attempt



Brian,
 
please see in line below
 
Howard.
 
Coherent Technology Limited, 23 Northampton Square, Finsbury, London EC1V 0HL, 
United Kingdom
Telephone: +44 20 7690 7075 Mobile: +44 7980 639379
Company Email: coherent@cohtech.com Website: http://www.cohtech.com 
<http://www.cohtech.com/>  

________________________________

From: owner-nssldap@padl.com on behalf of Brian J. Murrell
Sent: Mon 2009-11-23 13:49
To: nssldap@padl.com
Subject: [nssldap] RE: nss_ldap: reconnected to LDAP server ldap://localhost 
after 1 attempt



On Mon, 2009-11-23 at 10:43 +0000, Howard Wilkinson wrote:
> Brian,

Hi Howard,

> some version numbers might help along with the OS details.

Yeah, somewhat.  But the overriding issue is that this has been stable
(and not doing what it's doing now) for many, many months (over a years
worth of them probably), since it was installed and it's Ubuntu Hardy
(LTS) so it's quite established.  In fact LTS is a two year O/S with
renewing coming up in the spring, so it's nearly two years established
with this new behaviour only a few weeks old.

But for versions... O/S is Ubuntu Hardy (8.04), which include ntp
4.2.4p4+dfsg-3ubuntu2.2 and libnss-ldap 258-1ubuntu3.

OK not toooooo ancient then!

> At a guess you have a resource exhaustion somewhere,

Yeah.  That's the sort of thing that occurred to me too, but this is a
pretty light duty server with less clients than I have fingers and load
has not changed at all in the last many years.
 
But load pattern may have changed!

> have you restarted the box to check the problem still exists.

Funny enough, I had to restart it this morning for otherwise unrelated
reasons, and yeah, still doing it.
 
So something in the environment then!!! maybe

> Are any of your filing systems filling up (/tmp, /var/tmp, /var/log, ...)

Nope.
 
Well then the easy solution is not there - check permissions as well!

> You could look at my patches to the latest nss_ldap - they include a rewrite 
> of the reconnection logic which is more robust that that available in the 
> current mainstream.

Yeah.  I'm aware of your patches, but as I said, this has been stable
for nearly two years until a couple of weeks ago.  There really should
be no reason to add new code to have it return to stability.  It really
only needs analysing why the connection has started failing so
frequently all of a sudden.  And we do know that it's nss_ldap that
closing the connection, not the LDAP server.
 
nss_ldap closing the connection suggest bad data or bad config - how is your 
ldap.conf set up? Could you have an expire certificate for TLS or a bad 
Kerberos principal?

> Finally, you might want to check that you do not have bad data in the LDAP 
> environment.

Hrm.  Perhaps.

I guess what I was more looking for was some debug that could be enabled
in nss_ldap to tell me about connection events.  And even if there is no
debug of that nature, I'm happy to insert some.  I first need to be able
to replicate the connection dropping problem in a test harness though
and for that some understanding of how nss_ldap is supposed to work.

If I were to write a little program to fetch some NSS information, is
the nss_ldap library supposed to create a new connection to the LDAP
server on a first query of NSS info (gethostbyname(), say) and then
maintain that connection for any further queries until the process dies?
That I can test quite easily I'd say.

Yes the connection is keep open within the same process, there are a number of 
tricks in the code to determine if a fork has happened and close things 
(threading as well causes this) - any chance you have updated kernel or glibc 
recently that could have changed the threading environment.
 
Is NSCD in this mix?
 
Howard.
b.