lists.arthurdejong.org
RSS feed

RE: [nssldap] nss_ldap: reconnected to LDAP server ldap://localhost after 1 attempt

[Date Prev][Date Next] [Thread Prev][Thread Next]

RE: [nssldap] nss_ldap: reconnected to LDAP server ldap://localhost after 1 attempt



Brian,
 
some version numbers might help along with the OS details.
 
At a guess you have a resource exhaustion somewhere, have you restarted the box 
to check the problem still exists. Are any of your filing systems filling up 
(/tmp, /var/tmp, /var/log, ...)
 
You could look at my patches to the latest nss_ldap - they include a rewrite of 
the reconnection logic which is more robust that that available in the current 
mainstream.
 
Finally, you might want to check that you do not have bad data in the LDAP 
environment.
 
Howard.
 
Coherent Technology Limited, 23 Northampton Square, Finsbury, London EC1V 0HL, 
United Kingdom
Telephone: +44 20 7690 7075 Mobile: +44 7980 639379
Company Email: coherent@cohtech.com Website: http://www.cohtech.com 
<http://www.cohtech.com/>  

________________________________

From: owner-nssldap@padl.com on behalf of Brian J. Murrell
Sent: Sun 2009-11-22 16:07
To: nssldap@padl.com
Subject: [nssldap] nss_ldap: reconnected to LDAP server ldap://localhost after 
1 attempt



On my "super-server" which is running both my LDAP server and my mail
server (so they communicate on loopback) as of late, my mail server has
been complaining as such:

Nov 22 10:47:01 linux cyrus/lmtpunix[11640]: nss_ldap: reconnecting to LDAP 
server...
Nov 22 10:47:01 linux cyrus/lmtpunix[11640]: nss_ldap: reconnected to LDAP 
server ldap://localhost after 1 attempt
Nov 22 10:47:01 linux cyrus/lmtpunix[11640]: nss_ldap: reconnecting to LDAP 
server...
Nov 22 10:47:01 linux cyrus/lmtpunix[11640]: nss_ldap: reconnected to LDAP 
server ldap://localhost after 1 attempt
Nov 22 10:47:13 linux cyrus/lmtpunix[11640]: nss_ldap: reconnecting to LDAP 
server...
Nov 22 10:47:14 linux cyrus/lmtpunix[11640]: nss_ldap: reconnected to LDAP 
server ldap://localhost after 1 attempt
Nov 22 10:47:14 linux cyrus/lmtpunix[11640]: nss_ldap: reconnecting to LDAP 
server...
Nov 22 10:47:14 linux cyrus/lmtpunix[11640]: nss_ldap: reconnected to LDAP 
server ldap://localhost after 1 attempt
Nov 22 10:50:04 linux cyrus/lmtpunix[11749]: nss_ldap: reconnecting to LDAP 
server...
Nov 22 10:50:05 linux cyrus/lmtpunix[11749]: nss_ldap: reconnected to LDAP 
server ldap://localhost after 1 attempt
Nov 22 10:50:05 linux cyrus/lmtpunix[11749]: nss_ldap: reconnecting to LDAP 
server...
Nov 22 10:50:05 linux cyrus/lmtpunix[11749]: nss_ldap: reconnected to LDAP 
server ldap://localhost after 1 attempt
Nov 22 10:51:07 linux cyrus/lmtpunix[11789]: nss_ldap: reconnecting to LDAP 
server...
Nov 22 10:51:07 linux cyrus/lmtpunix[11789]: nss_ldap: reconnected to LDAP 
server ldap://localhost after 1 attempt
Nov 22 10:51:07 linux cyrus/lmtpunix[11789]: nss_ldap: reconnecting to LDAP 
server...
Nov 22 10:51:07 linux cyrus/lmtpunix[11789]: nss_ldap: reconnected to LDAP 
server ldap://localhost after 1 attempt

This is new behaviour on existing software (i.e. no version upgrades in
quite a long time -- certainly no in the time frame that this has
started).

I'm at a loss as to how to start debugging it though.

I have straced the lmtpd process and can see that it's closing the
connection to the ldap server and reopening it frequently:

getsockname(16, {sa_family=AF_INET, sin_port=htons(53883), 
sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
getpeername(16, {sa_family=AF_INET, sin_port=htons(389), 
sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
stat64("/etc/ldap.conf", {st_mode=S_IFREG|0644, st_size=9210, ...}) = 0
geteuid32()                             = 76
getsockname(16, {sa_family=AF_INET, sin_port=htons(53883), 
sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
getpeername(16, {sa_family=AF_INET, sin_port=htons(389), 
sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
write(16, "0\5\2\1\4B\0", 7)            = 7
shutdown(16, 2 /* send and receive */)  = 0
close(16)                               = 0
shutdown(-1, 2 /* send and receive */)  = -1 EBADF (Bad file descriptor)
close(-1)                               = -1 EBADF (Bad file descriptor)
...
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 16
setsockopt(16, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
setsockopt(16, SOL_TCP, TCP_NODELAY, [1], 4) = 0
fcntl64(16, F_GETFL)                    = 0x2 (flags O_RDWR)
fcntl64(16, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(16, {sa_family=AF_INET, sin_port=htons(389), 
sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in 
progress)
poll([{fd=16, events=POLLOUT|POLLERR|POLLHUP, revents=POLLOUT}], 1, 30000) = 1
getpeername(16, {sa_family=AF_INET, sin_port=htons(389), 
sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
fcntl64(16, F_GETFL)                    = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl64(16, F_SETFL, O_RDWR)            = 0
time(NULL)                              = 1258904423
time(NULL)                              = 1258904423
write(16, "0\f\2\1\1`\7\2\1\3\4\0\200\0", 14) = 14
gettimeofday({1258904423, 677944}, NULL) = 0
poll([{fd=16, events=POLLIN|POLLPRI|POLLERR|POLLHUP, revents=POLLIN}], 1, 
30000) = 1
read(16, "0\f\2\1\1a\7\n", 8)           = 8
read(16, "\1\0\4\0\4\0", 6)             = 6

and tshark confirms very short lived connections as well with the client
being the one to terminate the connection, not the LDAP server:

  2.978355    127.0.0.1 -> 127.0.0.1    TCP 54633 > 389 [SYN] Seq=0 Win=32792 
Len=0 MSS=16396 TSV=1617515729 TSER=0 WS=6
  2.978558    127.0.0.1 -> 127.0.0.1    TCP 389 > 54633 [SYN, ACK] Seq=0 Ack=1 
Win=32768 Len=0 MSS=16396 TSV=1617515729 TSER=1617515729 WS=6
  2.978666    127.0.0.1 -> 127.0.0.1    TCP 54633 > 389 [ACK] Seq=1 Ack=1 
Win=32832 Len=0 TSV=1617515729 TSER=1617515729
  2.979937    127.0.0.1 -> 127.0.0.1    LDAP bindRequest(1) "<ROOT>" simple
  2.980035    127.0.0.1 -> 127.0.0.1    TCP 389 > 54633 [ACK] Seq=1 Ack=15 
Win=32768 Len=0 TSV=1617515729 TSER=1617515729
  2.980356    127.0.0.1 -> 127.0.0.1    LDAP bindResponse(1) success
  2.981019    127.0.0.1 -> 127.0.0.1    TCP 54633 > 389 [ACK] Seq=15 Ack=15 
Win=32832 Len=0 TSV=1617515729 TSER=1617515729
  2.981369    127.0.0.1 -> 127.0.0.1    LDAP searchRequest(2) 
"dc=interlinx,dc=bc,dc=ca" wholeSubtree
...
  2.983428    127.0.0.1 -> 127.0.0.1    LDAP searchResEntry(2) 
"cn=xxx,ou=Group,dc=interlinx,dc=bc,dc=ca"
  2.984569    127.0.0.1 -> 127.0.0.1    LDAP searchResEntry(2) 
"cn=xxx,ou=Group,dc=interlinx,dc=bc,dc=ca"
  2.999450    127.0.0.1 -> 127.0.0.1    TCP 54633 > 389 [ACK] Seq=187 Ack=229 
Win=32832 Len=0 TSV=1617515734 TSER=1617515730
  2.999556    127.0.0.1 -> 127.0.0.1    LDAP searchResEntry(2) 
"cn=xxx,ou=Group,dc=interlinx,dc=bc,dc=ca" | searchResEntry(2) 
"cn=xxx,ou=Group,dc=interlinx,dc=bc,dc=ca" | searchResEntry(2) ...  [Packet 
size limited during capture]
  3.017154    127.0.0.1 -> 127.0.0.1    TCP 54623 > 389 [FIN, ACK] Seq=3064 
Ack=46603 Win=49536 Len=0 TSV=1617515738 TSER=1617515016
  3.018338    127.0.0.1 -> 127.0.0.1    TCP 389 > 54623 [FIN, ACK] Seq=46603 
Ack=3065 Win=48896 Len=0 TSV=1617515739 TSER=1617515738
  3.018437    127.0.0.1 -> 127.0.0.1    TCP 54623 > 389 [ACK] Seq=3065 
Ack=46604 Win=49536 Len=0 TSV=1617515739 TSER=1617515739
  3.038089    127.0.0.1 -> 127.0.0.1    TCP 54633 > 389 [ACK] Seq=187 Ack=2308 
Win=36992 Len=0 TSV=1617515744 TSER=1617515734

So the question is, why would nss_ldap be behaving this way, all of a
sudden?

Cheers,
b.