lists.arthurdejong.org
RSS feed

Re: [nssldap] Reconnect logic in nss_ldap

[Date Prev][Date Next] [Thread Prev][Thread Next]

Re: [nssldap] Reconnect logic in nss_ldap



Howard Wilkinson wrote:
There are a number of timeouts and a backoff policy in the current
implementation of nss_ldap (265) which I have reimplemented in my set of
patches that have been posted as part of bug number 412. These are driven by
number of second timers. The relevant configuration items are:



* bind_policy - which can take 4 values: hard, hard_init, hard_open& soft.
Currently all of the hard values are treated the same.
* nss_reconnect_tries - which defaults to 5 and limits then number of times
a
connection attempt will be made before the code gives up.
* nss_reconnect_sleeptime - which defaults to 4 and is the minimum amount
of
time the code will sleep between connection attempts. This is a number of 
seconds.
* nss_reconnect_maxsleeptime - which default to 64 and is the maximum
amount
of time the code should sleep between connection attempts. The actual sleep
time starts at nss_reconnect_sleeptime and doubles each time the connections
have failed until it exceeds the nss_reconnect_maxsleeptime. So setting this
to 65 will allow the last sleep to be 128 seconds.
* nss_reconnect_maxconntries - which defaults to 2 and is misnamed. This
is
the maximum number of connection tries that will happen before the code starts
to use the backoff algorithm. While the try count is below this number the
code will retry immediately.

With a soft bind_policy the code will give up immediately if a connection
to
all of the servers provisioned fails. With a hard bind_policy then the code
will enter into the retry loops.

The exponential backoff algorithm is clunky and probably should be
configurable as one of: exponential, linear, constant, progressive, where:

* exponential is as currently implemented and doubles the timeout on each
loop. * linear is where the timeout is equal to the number of tries times
the
initial sleep time
* constant just sleep the same sleep time every time around the loop *
progressive uses a second increment which is added onto the last sleep
time to produce the next one every time round the loop.

The logic around the maxsleeptime should be changed so that it does what
it
says and limits the backoff to this maximum.

The maxconntries variable should be aliased to another name which is more
meaningful (suggestions welcomed) such as nss_reconnect_nosleeptries.

Also, with modern processors and communications networks it probably makes
sense to allow the sleep times to be expressed as fractions of a second. Given
that I would propose changing the code to use nanosleep if available, usleep
if this is not available and sleep as a last resort.

Does anybody have any comments or additional suggestions to make around
this
subject. I will probably implement patches in this area this weekend, so
responses ASAP please.

Too many nss_reconnect parameters, too much to document/remember. You might try implementing something similar to the OpenLDAP syncrepl retry parameter, which is a list of <interval> <number> pairs. This obviates the need to have different algorithms embedded in the code.

E.g., "10 1 20 1 40 1 80 1 160 +" would be a simple exponential backoff, with a maximum delay of 160 seconds repeated indefinitely.

"20 10" would be a constant 20 second retry, repeated 10 times, and then stopping. And so on.

I don't believe sub-second resolution is useful here, and it certainly isn't worth the additional effort in finding all of the system-dependent variations on the theme.

--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/