lists.arthurdejong.org
RSS feed

webcheck commit: r460 - in webcheck/webcheck: . parsers/html

[Date Prev][Date Next] [Thread Prev][Thread Next]

webcheck commit: r460 - in webcheck/webcheck: . parsers/html



Author: arthur
Date: Tue Nov  8 22:58:48 2011
New Revision: 460
URL: http://arthurdejong.org/viewvc/webcheck?revision=460&view=revision

Log:
fix encoding issues with strings passed to/from tidy

Modified:
   webcheck/webcheck/config.py
   webcheck/webcheck/parsers/html/calltidy.py

Modified: webcheck/webcheck/config.py
==============================================================================
--- webcheck/webcheck/config.py Fri Nov  4 10:13:40 2011        (r459)
+++ webcheck/webcheck/config.py Tue Nov  8 22:58:48 2011        (r460)
@@ -109,4 +109,4 @@
                     accessibility_check=1,
                     show_errors=6,
                     show_warnings=1,
-                    char_encoding='raw')
+                    char_encoding='utf8')

Modified: webcheck/webcheck/parsers/html/calltidy.py
==============================================================================
--- webcheck/webcheck/parsers/html/calltidy.py  Fri Nov  4 10:13:40 2011        
(r459)
+++ webcheck/webcheck/parsers/html/calltidy.py  Tue Nov  8 22:58:48 2011        
(r460)
@@ -31,7 +31,9 @@
     link."""
     # only call tidy on internal pages
     if link.is_internal:
+        # force encoding of the content to UTF-8
+        content = content.decode(link.encoding).encode('utf-8')
         t = tidy.parseString(content, **config.TIDY_OPTIONS)
         for err in t.errors:
             # error messages are escaped so we unescape them
-            link.add_pageproblem(htmlunescape(unicode(err)))
+            link.add_pageproblem(htmlunescape(unicode(str(err), 'utf-8', 
'replace')))
-- 
To unsubscribe send an email to
webcheck-commits-unsubscribe@lists.arthurdejong.org or see
http://lists.arthurdejong.org/webcheck-commits/