Re: webcheck max depth patch
[
Date Prev][
Date Next]
[
Thread Prev][
Thread Next]
Re: webcheck max depth patch
- From: Devin Bayer <l [at] t-0.be>
- To: Arthur de Jong <arthur [at] arthurdejong.org>
- Cc: webcheck-users <webcheck-users [at] lists.arthurdejong.org>
- Subject: Re: webcheck max depth patch
- Date: Fri, 4 Nov 2011 12:15:09 +0100
On 2011-11-04, at 10:13, Arthur de Jong wrote:
> On Wed, 2011-11-02 at 16:42 +0100, Devin Bayer wrote:
>> The following patch, against SVN trunk, does two things:
>
> Thanks for the patch.
You're welcome. I have webcheck quite useful and wish more site authors would
use it too :)
I think a few more patches may come from me soon.
>> 1. Fixes encoding issues with the legacy HTMLParser and SQL
>
> I don't really understand what you're doing here:
>
> - parser.feed(content)
> + parser.feed(content.decode('ascii', errors='ignore').encode())
>
> It seems that you are converting from ASCII to the local encoding. The
> encoding should already be used in most places and internally webcheck
> should use unicode strings as much as possible.
What happens is:
1. decode content, treating it as ascii, but ignoring errors. Returns a
unicode()
2. encode that unicode as ascii, returning a str()
So, basically, if the content is not valid in the local encoding, it now is
because we re-encoded it but discarded invalid characters. I could send you the
urls that require this - I think it was when non-ASCII was in the tag names.
> Also, this change:
>
> - self.pageproblems.append(PageProblem(message=message))
> +
> self.pageproblems.append(PageProblem(message=message.decode(errors='replace')))
>
> I think SQLAlchemy should already handle both strings and unicode
> objects transparently.
You would think so, but SQLAlchemy's error message was very verbose and
explained you should not pass it 8-bit strings, only ASCII or unicode objects.
Cheers,
Devin
--
To unsubscribe send an email to
webcheck-users-unsubscribe@lists.arthurdejong.org or see
http://lists.arthurdejong.org/webcheck-users/