lists.arthurdejong.org
RSS feed

Re: very slow initialization after reboot

[Date Prev][Date Next] [Thread Prev][Thread Next]

Re: very slow initialization after reboot



Hi,

Sounds more to me like an OS related issue than nslcd, as it happens
after reboot only.

Your system seem to use systemd. Issues/latencies at boot time are
known to occur for services that depend on network-online.target (such
as nslcd), as they will wait for that unit to be completely done
before really starting.
And with systemd, knowing when the host is "online" is a wild guess.
For example, disabling ipv6 with sysctl only will cause the kind of
issues you experience, as systemd will patiently and stubbornly try to
active an ipv6 interface no matter what, hence the "2 mins" delay
before it finally gives up and consider the host online.

So, before trying to debug nslcd, check your systemd logs for any
related "systemd-networkd-wait-online" messages...

--
Mat


On Tue, Nov 12, 2019 at 4:44 PM Manhong Dai <daimh@umich.edu> wrote:
>
> Hi Arthur,
>
>      I did some logging per your advice. All the files are under
> https://y.mbni.org/nslcd-debug/
>
>      'nslcd-tcpdump.mp4' shows that the node was not sending any tcp
> packets out until the one-minute pause was over.
>
>      'nslcd-strace.mp4' shows how I strace-ed it, and copied the trace
> file into three segments, which were tarred in 'nslcd.trace.tar.gz'.
>
>      If you need more information, feel free to let me know, please.
>
>
> Best,
>
> Manhong
>
>
> On 11/11/19 5:13 PM, Arthur de Jong wrote:
> > On Mon, 2019-11-11 at 14:03 -0500, Manhong Dai wrote:
> >> After reboot, the first 'id <USER>' took about two minutes and
> >> then failed. Then all following 'id' command work fine. During the
> >> two minutes of waiting period, I tcpdump-ed the packets on both the
> >> LDAP client and LDAP server,  but didn't detect any packets until the
> >> first 'id' command failed.
> > Hi Manhong,
> >
> > The logs show that the initial connection seems to be set up but the
> > BIND operation takes a very long time. It is unclear to me why this
> > takes so long.
> >
> > In any case the maximum time to wait for a response can be set with the
> > timelimit option. This should ensure that the process does not block
> > for too long. Then the reconnect logic of nslcd will kick in (see the
> > reconnect_sleeptime and reconnect_retrytime options).
> >
> > If this is can be traced to some networking or a firewall issue a way
> > to reset the reconnect timers is to send a SIGUSR1 signal to nslcd
> > (assuming you use a recent version of nss-pam-ldapd). On Debian-based
> > systems for example, the /etc/network/if-up.d/nslcd file ensures that
> > the timers are reset every time networking is restored.
> >
> > More ideas for debugging this further are running nslcd under strace
> > (start it as "strace -t -f -o /var/log/nslcd.trace nslcd -d") to
> > actually see which operation is blocking so long, looking to see if any
> > network traffic is actually beging sent, seeing whether ldapsearch is
> > able to perform search queries during the blocking time, trying to
> > connect with netcat to port 389 of the LDAP to see if there is a
> > networking issue and looking at the LDAP server logs.
> >
> > Hope this helps,
> >



-- 
Mathieu