lists.arthurdejong.org
RSS feed

webcheck commit: r453 - webcheck/webcheck

[Date Prev][Date Next] [Thread Prev][Thread Next]

webcheck commit: r453 - webcheck/webcheck



Author: arthur
Date: Sat Oct  8 16:12:30 2011
New Revision: 453
URL: http://arthurdejong.org/viewvc/webcheck?revision=453&view=revision

Log:
also handle exceptions while parsing (e.g. issue when reading the response 
times out)

Modified:
   webcheck/webcheck/crawler.py

Modified: webcheck/webcheck/crawler.py
==============================================================================
--- webcheck/webcheck/crawler.py        Sat Oct  8 16:04:03 2011        (r452)
+++ webcheck/webcheck/crawler.py        Sat Oct  8 16:12:30 2011        (r453)
@@ -363,14 +363,17 @@
         if parsermodule is None:
             debugio.debug('crawler.Link.fetch(): unsupported content-type: %s' 
% link.mimetype)
             return
-        # skip parsing of content if we were returned nothing
-        content = response.read()
-        if content is None:
-            return
-        # parse the content
-        debugio.debug('crawler.Link.fetch(): parsing using %s' % 
parsermodule.__name__)
         try:
+            # skip parsing of content if we were returned nothing
+            content = response.read()
+            if content is None:
+                return
+            # parse the content
+            debugio.debug('crawler.Link.fetch(): parsing using %s' % 
parsermodule.__name__)
             parsermodule.parse(content, link)
+        except KeyboardInterrupt:
+            # handle this in a higher-level exception handler
+            raise
         except Exception, e:
             import traceback
             traceback.print_exc()
-- 
To unsubscribe send an email to
webcheck-commits-unsubscribe@lists.arthurdejong.org or see
http://lists.arthurdejong.org/webcheck-commits/