lists.arthurdejong.org
RSS feed

And the issue with HTML escaping

[Date Prev][Date Next] [Thread Prev][Thread Next]

And the issue with HTML escaping



webcheck: INFO: http://www.highresolution.info/
webcheck: DEBUG: parsing using webcheck.parsers.html
webcheck: DEBUG: crawler.Link.set_encoding('utf-8')
webcheck: DEBUG: html encoding: utf-8
webcheck: WARNING: page has unknown encoding: utf-8
webcheck: ERROR: problem parsing page: decoding Unicode is not supported
Traceback (most recent call last):
  File "/home/dev/linkcheck/webcheck/webcheck/crawler.py", line 372, in parse
    parsermodule.parse(content, link)
  File "/home/dev/linkcheck/webcheck/webcheck/parsers/html/__init__.py", line 
118, in parse
    _parsefunction(content, link)
  File "/home/dev/linkcheck/webcheck/webcheck/parsers/html/htmlparser.py", line 
292, in parse
    link.author = _maketxt(parser.author, link.encoding).strip()
  File "/home/dev/linkcheck/webcheck/webcheck/parsers/html/htmlparser.py", line 
265, in _maketxt
    return htmlunescape(unicode(txt, errors='replace'))
TypeError: decoding Unicode is not supported

-- 
To unsubscribe send an email to
webcheck-users-unsubscribe@lists.arthurdejong.org or see
http://lists.arthurdejong.org/webcheck-users/