RSS feed

Re: meta data checking plugin for webcheck

[Date Prev][Date Next] [Thread Prev][Thread Next]

Re: meta data checking plugin for webcheck

On Fri, 2014-01-10 at 15:47 +0100, Fabien Quatravaux wrote:
> I discovered webcheck today and I found it very useful.
> I would like to improve it and develop a plugin to check for meta data
> (meta tags and some tags), but I can't find any
> documentation about how to write a plugin. Specifically, I would like
> to know if the python parser has already extracted the meta tags, and
> where I can find them.

If you want to add functionality to website I recommend you use the Git
version. The plugin structure is somewhat simpler in that version.

The content parsing code is still mostly hard-coded though (I have some
ideas about also using the plugins for that).

The best place to do this extra parsing for now is probably to add an
extra call at the end of
and pass the soup variable that can be queried for HTML structure.

Another thing is that the database schema currently has hard-coded
properties (see webcheck.db). It is probably better to make a
LinkProperty class and use that for title, size, mime type, etc. That
would also be a good place to store other meta data of crawled pages.

If you have any code to share, I'm willing to integrate it into


-- arthur - - --
To unsubscribe send an email to or see