lists.arthurdejong.org
RSS feed

Re: [nssldap] Solaris 10 update 5 - nss_ldap makes nscd dump core

[Date Prev][Date Next] [Thread Prev][Thread Next]

Re: [nssldap] Solaris 10 update 5 - nss_ldap makes nscd dump core




On Jan 14, 2010, at 2:54 AM, Howard Wilkinson wrote:

I finally cracked this yesterday, it is logic bomb inside the new NSS layer on Solaris 10 u5+. The logic processes TRYAGAIN in a lower layer from the buffer growth. So the ERANGE has no effect until the tryagain loop terminates. The default setting for this lopp is forever. I have posted a 'final' set of patches to the bug set for nss_ldap (bug number 421) which will work if the nsswitch on Solaris includes a limit on the tryagain behaviour. THus

passwd: files ldap [TRYAGAIN=5]

will hit the ldap server 5 times (and fail) return to the upper layer get a larger buffer and call in again and get the result. Not ideal but it functions. I am trying to get this reported as an official bug to Sun.

Nice!


Interestingly the Solaris native client just fails.

I would still like to view any code you have working. Just to check I have not missed any wrinkles, but the build I have on the client site looks stable under the load tests I have run so far.

That should be posted today. The code also includes changes we made to support HP-UX Trusted Mode and basic Solaris SPARKS work. We are making those and all other changes in the posted source base available to the community for general use.

Cheers,

-Matt


Coherent Technology Limited, 23 Northampton Square, Finsbury, London EC1V 0HL, United Kingdom
Telephone: +44 20 3355 6467 Mobile: +44 7980 639379
Company Email: coherent@cohtech.com Website: http://www.cohtech.com <http://www.cohtech.com/ >

________________________________

From: tedcheng [tedcheng [at] cohtech.com]
Sent: Thu 2010-01-14 01:05
To: Matthew Hardin
Cc: Howard Wilkinson; Thomas Glanzmann; Luke Howard; nssldap@padl.com; Bernhard.Thalmayr@Sun.COM Subject: Re: [nssldap] Solaris 10 update 5 - nss_ldap makes nscd dump core



< 8761A2BBE915DF46BB20E017C891E48CAF9E@Ferrari.coherent.cohtech.co.uk> <52C21826-818C-41C1-8FD1-80537BBC46E5@symas.com>
Message-ID: <3d4fee37e2cab56a8541423b4a1ec4e6@localhost>
X-Sender: tedcheng@localhost
User-Agent: RoundCube Webmail/0.3.1
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="UTF-8"



The SPARKS code did not change the NSS_TRYAGAIN/erange behavior, e.g.,

#ifdef SOLARIS_SPARKS
#define LOOKUP_GETENT(args, be, filter, selector, parser, req_buflen) \
       NSS_STATUS s; \
       if (NSS_ARGS(args)->buf.buflen < req_buflen) { \
               NSS_ARGS(args)->erange = 1; \
               return NSS_TRYAGAIN; \
       } \

In the case in which the code gets into the TRYAGAIN loop, is a larger
buffer provided in the new try?
In our build, we adjusted NGROUPS to 64 to move things forward. This
constant affects buffer size calculation.

Regards,

Ted C. Cheng


On Wed, 13 Jan 2010 16:58:44 -0700, Matthew Hardin <mhardin@symas.com>
wrote:
Hi Howard,

On Jan 12, 2010, at 2:51 AM, Howard Wilkinson wrote:

Having got the Solaris stuff built I am still seeing some stability
issues. A major one seems to be that the new Solaris code does not
support the NSS_TRYAGAIN/ERANGE interface feature to signal buffer
too small. The code seems to just get into a TRYAGAIN loop - do you
have any information that would suggest whther the interface change
has preserved this behaviour.

Not really, but I'm going to involve the engineer that did this work.
He may be able to shed some light on your question. His name's Ted
Cheng and I think he'll be joining the discussion soon. Who's our
friend Bernhard at Sun, and can he help us? We'd like to crack this
one too.


I would still like to see your code to compare what you have done
with mine. It is likely that I have missed things.

Yeah... sorry about that <looks embarrassed>. I got pulled off into
another direction and it dropped off my radar. I'll get that posted
ASAP.

Take care,

-Matt

Matthew Hardin
Symas Corporation - The LDAP Guys
http://www.symas.com <http://www.symas.com/>


Coherent Technology Limited, 23 Northampton Square, Finsbury, London
EC1V 0HL, United Kingdom
Telephone: +44 20 3355 6467 Mobile: +44 7980 639379
Company Email: coherent@cohtech.com Website: http:// www.cohtech.com <http://www.cohtech.com/>
<http://www.cohtech.com/


________________________________

From: Howard Wilkinson
Sent: Fri 2010-01-08 16:59
To: Howard Wilkinson; Matthew Hardin
Cc: Thomas Glanzmann; Luke Howard; nssldap@padl.com;
Bernhard.Thalmayr@Sun.COM
Subject: RE: [nssldap] Solaris 10 update 5 - nss_ldap makes nscd
dump core


I have another version of the patch to the bugzilla. This now uses
heap allocation rather than stack based. The stack was failing when
large groups were being used.

I have also added a piece of conditional compilation to remove the
work round for a bug in the openldap library when not being compiled
against openldap. This was making the sun native library fail.

Coherent Technology Limited, 23 Northampton Square, Finsbury, London
EC1V 0HL, United Kingdom
Telephone: +44 20 3355 6467 Mobile: +44 7980 639379
Company Email: coherent@cohtech.com Website: http:// www.cohtech.com <http://www.cohtech.com/>
<http://www.cohtech.com/


________________________________

From: owner-nssldap@padl.com on behalf of Howard Wilkinson
Sent: Wed 2010-01-06 16:34
To: Matthew Hardin
Cc: Thomas Glanzmann; Luke Howard; nssldap@padl.com;
Bernhard.Thalmayr@Sun.COM
Subject: RE: [nssldap] Solaris 10 update 5 - nss_ldap makes nscd
dump core



I have pushed another patch out to the bugzilla which now seems to
work on Solaris 10 and on Linux. I have run some extensive tests on
Linux and some limited tests on Solaris and it all seems to be
functioning fine so far.

I have made the decision to use stack allocation for some buffer
space that is needed when called from the Solaris nscd (this code is
done at runtime so will also happen on Linux if the interface ever
goes that way) and have made room for a stack checking function
(which I have yet to work out how to do).

I may choose to replace this with heap allocated data but will wait
for experience reports before deciding this.

Any experiences using this code would be most gratefully received.

Luke any chance of getting this intergrated into the mainstream
release?

Coherent Technology Limited, 23 Northampton Square, Finsbury, London
EC1V 0HL, United Kingdom
Telephone: +44 20 3355 6467 Mobile: +44 7980 639379
Company Email: coherent@cohtech.com Website: http:// www.cohtech.com <http://www.cohtech.com/>
<http://www.cohtech.com/
<http://www.cohtech.com/>

________________________________

From: Matthew Hardin [mhardin [at] symas.com]
Sent: Tue 2010-01-05 21:09
To: Howard Wilkinson
Cc: Thomas Glanzmann; Luke Howard; nssldap@padl.com;
Bernhard.Thalmayr@Sun.COM
Subject: Re: [nssldap] Solaris 10 update 5 - nss_ldap makes nscd
dump core




On Jan 5, 2010, at 2:05 AM, Howard Wilkinson wrote:

Matthew,

I have a partially working implementation. NSCD is calling into
nss_ldap and getting the results back but is not returning the
result back to the getent call. So any pointers would be gratefully
received. I am trying to get this working for a deployment in the
next few weeks so if you have patches I can try that would be very
helpful.

We're going to post the source code today or tomorrow without
encumbrances and you are free to use it as-is or extract and use
whatever information you find useful (well, attribution would be
nice). I'll follow up with a download url when the code is available.


Do you have any idea which hat dropping makes NSCD stop working?
Reading the OpenSolaris code it does look as though there is a lot
of dependencies on a stateful interface in the NSS2 facilities.

Unfortunately not. I do know that merely updating the time stamp on
the nsswitch.conf file will cause NSCD to start working again (until
it gets tired again). We are as puzzled as you are.

Cheers,

-Matt


Howard.

Coherent Technology Limited, 23 Northampton Square, Finsbury, London
EC1V 0HL, United Kingdom
Telephone: +44 20 3355 6467 Mobile: +44 7980 639379
Company Email: coherent@cohtech.com Website: http:// www.cohtech.com <http://www.cohtech.com/>
<http://www.cohtech.com/>  <http://www.cohtech.com/>
<http://www.cohtech.com/


________________________________

From: Matthew Hardin [mhardin [at] symas.com]
Sent: Mon 2010-01-04 18:00
To: Howard Wilkinson
Cc: Thomas Glanzmann; lukeh@padl.com; nssldap@padl.com;
Bernhard.Thalmayr@Sun.COM
Subject: Re: [nssldap] Solaris 10 update 5 - nss_ldap makes nscd
dump core



We've developed the SPARKS changes needed for Sol10u5 and later (this
is why nscd was dumping core), but nscd on Sol10u5 and later seems
very fragile and stops working at the drop of a hat. We've been
sitting on the code until we've worked this out, but that's been
going
very slowly. We would be happy to share what we have if anyone is
interested.

-Matt

On Jan 4, 2010, at 9:52 AM, Howard Wilkinson wrote:

You will need to apply this after all of my other patches. See
http://bugzilla.padl.com/show_bug.cgi?id=412

This is very much a work in progress - I am focussing on getting
getent passwd working with nscd running. If I can crack that then
the other changes are 'obvious'.

This has been compiled on both Sparc and x86 Solaris 10, but only
tested on Sparc so far.

Let me know how you get on!

Coherent Technology Limited, 23 Northampton Square, Finsbury, London
EC1V 0HL, United Kingdom
Telephone: +44 20 3355 6467 Mobile: +44 7980 639379
Company Email: coherent@cohtech.com Website: http:// <http:///>
www.cohtech.com <http://www.cohtech.com/> <http://www.cohtech.com/ >
<http://www.cohtech.com/>  <http://www.cohtech.com/


________________________________

From: Thomas Glanzmann [thomas [at] glanzmann.de]
Sent: Mon 2010-01-04 16:04
To: Howard Wilkinson
Subject: Re: [nssldap] Solaris 10 update 5 - nss_ldap makes nscd
dump core



Hello Howard,

Any help that you, or anybody else can give to fix this problem
will
be gratefully received. This has been a problem for over a week
now.

could you please send me you're build instructions and patches. I'll
be
happy to help track down the getent bug. I'm also interested in
providing a backport.

    Thomas


<nss_ldap-265-solarisfixes.patch>