lists.arthurdejong.org
RSS feed

Re: webcheck max depth patch

[Date Prev][Date Next] [Thread Prev][Thread Next]

Re: webcheck max depth patch



On Wed, 2011-11-02 at 16:42 +0100, Devin Bayer wrote: 
> The following patch, against SVN trunk, does two things:

Thanks for the patch.

> 1. Fixes encoding issues with the legacy HTMLParser and SQL

I don't really understand what you're doing here:

-        parser.feed(content)
+        parser.feed(content.decode('ascii', errors='ignore').encode())

It seems that you are converting from ASCII to the local encoding. The
encoding should already be used in most places and internally webcheck
should use unicode strings as much as possible.

Also, this change:

-        self.pageproblems.append(PageProblem(message=message))
+        
self.pageproblems.append(PageProblem(message=message.decode(errors='replace')))

I think SQLAlchemy should already handle both strings and unicode
objects transparently.

> 2. Adds a MAX_DEPTH config option

Thanks. Looks interesting, I've committed it. I did change it to support
having MAX_DEPTH set to None (the default for backwards compatibility).
I'm just wondering if we still need the postprocessing to determined the
depth.

Also, it would be nice if links that are too deep would be considered
yanked (we do need some postprocessing for this) and not external.

Thanks again for your patch.

-- 
-- arthur - arthur@arthurdejong.org - http://arthurdejong.org --
-- 
To unsubscribe send an email to
webcheck-users-unsubscribe@lists.arthurdejong.org or see
http://lists.arthurdejong.org/webcheck-users/