webcheck commit: r460 - in webcheck/webcheck: . parsers/html
[
Date Prev][
Date Next]
[
Thread Prev][
Thread Next]
webcheck commit: r460 - in webcheck/webcheck: . parsers/html
- From: Commits of the webcheck project <webcheck-commits [at] lists.arthurdejong.org>
- To: webcheck-commits [at] lists.arthurdejong.org
- Reply-to: webcheck-users [at] lists.arthurdejong.org
- Subject: webcheck commit: r460 - in webcheck/webcheck: . parsers/html
- Date: Tue, 8 Nov 2011 22:58:50 +0100 (CET)
Author: arthur
Date: Tue Nov 8 22:58:48 2011
New Revision: 460
URL: http://arthurdejong.org/viewvc/webcheck?revision=460&view=revision
Log:
fix encoding issues with strings passed to/from tidy
Modified:
webcheck/webcheck/config.py
webcheck/webcheck/parsers/html/calltidy.py
Modified: webcheck/webcheck/config.py
==============================================================================
--- webcheck/webcheck/config.py Fri Nov 4 10:13:40 2011 (r459)
+++ webcheck/webcheck/config.py Tue Nov 8 22:58:48 2011 (r460)
@@ -109,4 +109,4 @@
accessibility_check=1,
show_errors=6,
show_warnings=1,
- char_encoding='raw')
+ char_encoding='utf8')
Modified: webcheck/webcheck/parsers/html/calltidy.py
==============================================================================
--- webcheck/webcheck/parsers/html/calltidy.py Fri Nov 4 10:13:40 2011
(r459)
+++ webcheck/webcheck/parsers/html/calltidy.py Tue Nov 8 22:58:48 2011
(r460)
@@ -31,7 +31,9 @@
link."""
# only call tidy on internal pages
if link.is_internal:
+ # force encoding of the content to UTF-8
+ content = content.decode(link.encoding).encode('utf-8')
t = tidy.parseString(content, **config.TIDY_OPTIONS)
for err in t.errors:
# error messages are escaped so we unescape them
- link.add_pageproblem(htmlunescape(unicode(err)))
+ link.add_pageproblem(htmlunescape(unicode(str(err), 'utf-8',
'replace')))
--
To unsubscribe send an email to
webcheck-commits-unsubscribe@lists.arthurdejong.org or see
http://lists.arthurdejong.org/webcheck-commits/
- webcheck commit: r460 - in webcheck/webcheck: . parsers/html,
Commits of the webcheck project