And the issue with HTML escaping

[Date Prev][Date Next] [Thread Prev][Thread Next]

From: Devin Bayer <l [at] t-0.be>
To: webcheck-users <webcheck-users [at] lists.arthurdejong.org>
Subject: And the issue with HTML escaping
Date: Wed, 9 Nov 2011 14:03:23 +0100

webcheck: INFO: http://www.highresolution.info/
webcheck: DEBUG: parsing using webcheck.parsers.html
webcheck: DEBUG: crawler.Link.set_encoding('utf-8')
webcheck: DEBUG: html encoding: utf-8
webcheck: WARNING: page has unknown encoding: utf-8
webcheck: ERROR: problem parsing page: decoding Unicode is not supported
Traceback (most recent call last):
  File "/home/dev/linkcheck/webcheck/webcheck/crawler.py", line 372, in parse
    parsermodule.parse(content, link)
  File "/home/dev/linkcheck/webcheck/webcheck/parsers/html/__init__.py", line 
118, in parse
    _parsefunction(content, link)
  File "/home/dev/linkcheck/webcheck/webcheck/parsers/html/htmlparser.py", line 
292, in parse
    link.author = _maketxt(parser.author, link.encoding).strip()
  File "/home/dev/linkcheck/webcheck/webcheck/parsers/html/htmlparser.py", line 
265, in _maketxt
    return htmlunescape(unicode(txt, errors='replace'))
TypeError: decoding Unicode is not supported

-- 
To unsubscribe send an email to
webcheck-users-unsubscribe@lists.arthurdejong.org or see
http://lists.arthurdejong.org/webcheck-users/

And the issue with HTML escaping, Devin Bayer

Prev by Date: Example for HTML encoding patch
Next by Date: Patch - setup.py and MAX_DEPTH command line option
Previous by thread: Example for HTML encoding patch
Next by thread: Patch - setup.py and MAX_DEPTH command line option