lists.arthurdejong.org
RSS feed

webcheck commit: r469 - webcheck

[Date Prev][Date Next] [Thread Prev][Thread Next]

webcheck commit: r469 - webcheck



Author: devin
Date: Wed Nov 16 13:28:08 2011
New Revision: 469
URL: http://arthurdejong.org/viewvc/webcheck?revision=469&view=revision

Log:
update NEWS, README and HACKING

Modified:
   webcheck/HACKING
   webcheck/NEWS
   webcheck/README

Modified: webcheck/HACKING
==============================================================================
--- webcheck/HACKING    Wed Nov 16 13:07:51 2011        (r468)
+++ webcheck/HACKING    Wed Nov 16 13:28:08 2011        (r469)
@@ -6,20 +6,21 @@
 function. This graphs should present a simple overview of the modules and
 order of calling the functions.
 
-webcheck.py                 - main program, command line parsing, etc
+webcheck/                   - top-level namespace
+ \- cmd.py                  - main program entry point, command line parsing, 
etc
  \- config.py               - configuration settings (imported from most other
  |                            modules)
- \- debugio.py              - functions for printing output (imported from
- |                            most other modules)
+ \- util.py                 - common functions imported from most other modules
+ |
  \- crawler.py              - module with loop and logic for traversing a
  |   |                        website and storing all the information about
  |   |                        the website that is used later
- |   \- schemes/__init__.py - front-end module to make available scheme
- |   |   |                    modules for fetching content
- |   |   \- schemes/*.py    - per scheme (ftp/file/http) a module
- |   \- parsers/__init.py   - front-end module to handle parsing of content
- |       \- parsers/*.py    - parser modules for content (html and dummy css
- |                            currently)
+ \- myurllib.py             - module for ftp/file/http url fetching
+ |
+ \- parsers/__init__.py     - front-end module to handle parsing of content
+ |  \- html/                - parser modules for html content
+ |  \- css.py               - parser module for css (dummy currently)
+ |
  \- plugins/__init__.py     - front-end module for plugin modules, this calls
      |                        all configured plugins and has some helper
      |                        functions for plugins

Modified: webcheck/NEWS
==============================================================================
--- webcheck/NEWS       Wed Nov 16 13:07:51 2011        (r468)
+++ webcheck/NEWS       Wed Nov 16 13:28:08 2011        (r469)
@@ -1,3 +1,12 @@
+changes from 1.10.4 to 1.10.5 (alpha)
+-----------------------------
+
+* added setup.py for pypi/egg-based installation
+* support --levels option to control max depth
+* detect and report on endless redirects
+* move to sqlite for storing crawler state
+
+
 changes from 1.10.3 to 1.10.4
 -----------------------------
 

Modified: webcheck/README
==============================================================================
--- webcheck/README     Wed Nov 16 13:07:51 2011        (r468)
+++ webcheck/README     Wed Nov 16 13:28:08 2011        (r469)
@@ -64,7 +64,13 @@
 
 INSTALLING WEBCHECK
 ===================
+This will install the latest version from PyPi.
 
+  % easy_install webcheck
+
+
+MANUAL INSTALLATION
+===================
 Installation is relatively easy. These installation instructions are for
 Unix-like systems. Other operating systems may differ.
 
@@ -78,12 +84,6 @@
   3. Put the manual page in the MANPATH.
      % ln -s /opt/webcheck-1.10.4/webcheck.1 /usr/local/man/man1/webcheck.1
 
-webcheck does not use Distutils because that tool is meant to install Python
-modules which should end up in the default Python path (from what the author
-understands of Distutils). Since webcheck does not expose any public API, it
-is an application with only private modules. A (maintainable) setup.py which
-installs webcheck outside the public patch is welcome.
-
 
 RUNNING WEBCHECK
 ================
-- 
To unsubscribe send an email to
webcheck-commits-unsubscribe@lists.arthurdejong.org or see
http://lists.arthurdejong.org/webcheck-commits/