Possible Webcheck bug?

[Date Prev][Date Next] [Thread Prev][Thread Next]

From: "m.v.wesstein" <m.v.wesstein [at] hccnet.nl>
To: webcheck-users [at] lists.arthurdejong.org
Subject: Possible Webcheck bug?
Date: Sat, 05 Mar 2011 04:15:33 +0100

Hello again

Before submitting a bug report, I'll just ask in case you're already aware.

So far Webcheck hasn't failed me, even on a positively big site, but nowI found a site that gives me problems. The crawler goes through the<baseurl>/dirname/ structure fine, but then starts again with<baseurl>//dirname/ and after that, goes on to <baseurl>////dirname/before I killed it. Notice the additional slashes between the baseurland the rest.

Admittedly the site isn't mine, but the owner is currently seriously illand in hospice care and expectation is he's not to survive for muchlonger. The gentleman has build up a considerable wealth of info in hisfield of expertise and it would be a real shame if it was lost. Hence myidea to use Webcheck to get me a list of url's to fetch, then strip outall non-baseurl links and have a loop set up in bash to get each file.

The following links have been tested, with the same results, pointing tothe same site:

http://carendt.us/
http://www.carendt.us/
http://www.carendt.com/

Webcheck version: 1.10.4 on Debian Lenny. I have the webcheck.dat files(uncompressed) but not in debug mode I'm afraid...

Is it possible to redirect the screen output of Webcheck to a textfilewith the >> operand? I think so, but I'm not sure.

If you need more info let me know, I'll get it to you ASAP (but it maytake a while as I have to work over the weekend...)


Regards, Vincent Wesstein
the Netherlands
--
To unsubscribe send an email to
webcheck-users-unsubscribe@lists.arthurdejong.org or see
http://lists.arthurdejong.org/webcheck-users

Possible Webcheck bug?, m.v.wesstein

Re: Possible Webcheck bug?, Arthur de Jong

Next by Date: Re: Possible Webcheck bug?
Next by thread: Re: Possible Webcheck bug?