Warning while parsing
[Date Prev][Date Next] [Thread Prev][Thread Next]Warning while parsing
- From: "Jaroslav Lhotak" <lhotakj [at] rferl.org>
- To: <webcheck-users [at] lists.arthurdejong.org>
- Subject: Warning while parsing
- Date: Tue, 8 Nov 2011 16:03:40 +0100
Hi, Firstly I’d like to thank for your tool – I started to experimentally use it on our sites (www.svobodanews.ru) And here’s what I’m getting pretty often. Is this something what should concern me – problem of the page or the way you parse it? Please find in the attached gzipped webcheck.dat and the command line I ran. Many thanks, Jarda root@linux:/home/testing/webcheck# ./webcheck.sh webcheck: checking site.... webcheck: getting robots.txt for http://www.svobodanews.ru webcheck: http://www.svobodanews.ru/ webcheck: http://www.svobodanews.ru/js__ver2_6.0.0.19935.1/init.jsx webcheck: http://www.svobodanews.ru/howtolisten/waves.html webcheck: http://www.svobodanews.ru/video/27303.html webcheck: http://www.svobodanews.ru/content/article/24383999.html webcheck: Warning: problem parsing page: 'ascii' codec can't decode byte 0xc3 in position 44: ordinal not in range(128) Traceback (most recent call last): File "/usr/share/webcheck/crawler.py", line 549, in fetch parsermodule.parse(content, self) File "/usr/share/webcheck/parsers/html/__init__.py", line 121, in parse calltidy.parse(content, link) File "/usr/share/webcheck/parsers/html/calltidy.py", line 35, in parse link.add_pageproblem(parsers.html.htmlunescape(unicode(err))) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 44: ordinal not in range(128) webcheck: http://www.svobodanews.ru/video/6865a8b3-ff18-4ebd-9e45-eba1c8db44ed.jpgx?w=113&h=64 webcheck: http://www.svobodanews.ru/img/networking/bw_ybkm.gif webcheck: http://www.svobodanews.ru/video/2160905.html?isArticle=1 webcheck: http://www.svobodanews.ru/jssettings__ver2_6.0.0.19935.1/default.jsx?c=1 webcheck: http://www.svobodanews.ru/content/article/24382127.html ^Z [14]+ Stopped ./webcheck.sh ================== Jaroslav Lhoták Internet Project Manager - Internet Technology Radio Free Europe / Radio Liberty Inc. phone +420-2-2112-2031 |
Attachment:
webcheck.dat.gz
Description: Binary data
Attachment:
webcheck.sh.gz
Description: Binary data
-- To unsubscribe send an email to webcheck-users-unsubscribe@lists.arthurdejong.org or see http://lists.arthurdejong.org/webcheck-users/
- Warning while parsing, Jaroslav Lhotak
- Re: Warning while parsing,
Arthur de Jong
- RE: Warning while parsing, Jaroslav Lhotak
- Prev by Date: Re: webcheck max depth patch
- Next by Date: Re: Warning while parsing
- Previous by thread: Re: webcheck max depth patch
- Next by thread: Re: Warning while parsing