Possible Webcheck bug?
[Date Prev][
Date Next]
[Thread Prev][
Thread Next]
Possible Webcheck bug?
- From: "m.v.wesstein" <m.v.wesstein [at] hccnet.nl>
- To: webcheck-users [at] lists.arthurdejong.org
- Subject: Possible Webcheck bug?
- Date: Sat, 05 Mar 2011 04:15:33 +0100
Hello again
Before submitting a bug report, I'll just ask in case you're already aware.
So far Webcheck hasn't failed me, even on a positively big site, but now
I found a site that gives me problems. The crawler goes through the
<baseurl>/dirname/ structure fine, but then starts again with
<baseurl>//dirname/ and after that, goes on to <baseurl>////dirname/
before I killed it. Notice the additional slashes between the baseurl
and the rest.
Admittedly the site isn't mine, but the owner is currently seriously ill
and in hospice care and expectation is he's not to survive for much
longer. The gentleman has build up a considerable wealth of info in his
field of expertise and it would be a real shame if it was lost. Hence my
idea to use Webcheck to get me a list of url's to fetch, then strip out
all non-baseurl links and have a loop set up in bash to get each file.
The following links have been tested, with the same results, pointing to
the same site:
http://carendt.us/
http://www.carendt.us/
http://www.carendt.com/
Webcheck version: 1.10.4 on Debian Lenny. I have the webcheck.dat files
(uncompressed) but not in debug mode I'm afraid...
Is it possible to redirect the screen output of Webcheck to a textfile
with the >> operand? I think so, but I'm not sure.
If you need more info let me know, I'll get it to you ASAP (but it may
take a while as I have to work over the weekend...)
Regards, Vincent Wesstein
the Netherlands
--
To unsubscribe send an email to
webcheck-users-unsubscribe@lists.arthurdejong.org or see
http://lists.arthurdejong.org/webcheck-users
- Possible Webcheck bug?,
m.v.wesstein