lists.arthurdejong.org
RSS feed

python-stdnum commit: r42 - in python-stdnum: . stdnum stdnum/isbn tests

[Date Prev][Date Next] [Thread Prev][Thread Next]

python-stdnum commit: r42 - in python-stdnum: . stdnum stdnum/isbn tests



Author: arthur
Date: Wed Nov 24 23:09:28 2010
New Revision: 42
URL: http://arthurdejong.org/viewvc/python-stdnum?view=rev&revision=42

Log:
implement a new numdb module to hold information on hierarchically organised 
numbers and switch the isbn module to use this format instead

Added:
   python-stdnum/getisbn.py   (contents, props changed)
      - copied, changed from r41, python-stdnum/stdnum/isbn/ranges.py
   python-stdnum/stdnum/isbn.dat
   python-stdnum/stdnum/isbn.py
      - copied, changed from r41, python-stdnum/stdnum/isbn/__init__.py
   python-stdnum/stdnum/numdb.py
   python-stdnum/test.dat
Deleted:
   python-stdnum/stdnum/isbn/
Modified:
   python-stdnum/tests/test_isbn.doctest

Copied and modified: python-stdnum/getisbn.py (from r41, 
python-stdnum/stdnum/isbn/ranges.py)
==============================================================================
--- python-stdnum/stdnum/isbn/ranges.py Sat Sep 11 11:13:47 2010        (r41, 
copy source)
+++ python-stdnum/getisbn.py    Wed Nov 24 23:09:28 2010        (r42)
@@ -1,4 +1,6 @@
-# ranges.py - list of ISBN prefix data and utility functions
+#!/usr/bin/env python
+
+# getisbn.py - script to get ISBN prefix data
 #
 # Copyright (C) 2010 Arthur de Jong
 #
@@ -17,374 +19,87 @@
 # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
 # 02110-1301 USA
 
-"""This module contains that current ISBN group and registrant prefixes as
-they are registered with the International ISBN Agency. This information
-is needed to correctly split an ISBN into an EAN.UCC prefix, a group prefix,
-a registrant, an item number and a check-digit."""
+"""This script downloads XML data from the International ISBN Agency
+website and provides a compact form of all group prefixes, and registrant
+ranges for those prefixes suitable for the numdb module. This data is needed
+to correctly split ISBNs into an EAN.UCC prefix, a group prefix, a registrant,
+an item number and a check-digit."""
 
-# The place where the current version of RangeMessage.xml can be downloaded.
-_download_url = 'http://www.isbn-international.org/agency?rmxml=1'
+import xml.sax
+import urllib
 
-# What follows is a representation of the prefixes that are defined by
-# International ISBN Agency to correctly split ISBNs. See the download()
-# and output() methods on how to download and generate this data.
-
-# generated from RangeMessage.xml, downloaded from
-# http://www.isbn-international.org/agency?rmxml=1
-# serial 0aad2b046ddd9b30e080cb2b24afc868
-# date Thu, 20 May 2010 18:36:55 GMT
-_prefixes = """
-978 0-5 600-649 7-7 80-94 950-989 9900-9989 99900-99999
-978-0 00-19 200-699 7000-8499 85000-89999 900000-949999 9500000-9999999
-978-1 00-09 100-399 4000-5499 55000-86979 869800-998999 9990000-9999999
-978-2 00-19 200-349 35000-39999 400-699 7000-8399 84000-89999 900000-949999
-978-2 9500000-9999999
-978-3 00-02 030-033 0340-0369 03700-03999 04-19 200-699 7000-8499 85000-89999
-978-3 900000-949999 9500000-9539999 95400-96999 9700000-9899999 99000-99499
-978-3 99500-99999
-978-4 00-19 200-699 7000-8499 85000-89999 900000-949999 9500000-9999999
-978-5 00-19 200-420 4210-4299 430-430 4310-4399 440-440 4410-4499 450-699
-978-5 7000-8499 85000-89999 900000-909999 91000-91999 9200-9299 93000-94999
-978-5 9500000-9500999 9501-9799 98000-98999 9900000-9909999 9910-9999
-978-600 00-09 100-499 5000-8999 90000-99999
-978-601 00-19 200-699 7000-7999 80000-84999 85-99
-978-602 00-19 200-799 8000-9499 95000-99999
-978-603 00-04 05-49 500-799 8000-8999 90000-99999
-978-604 0-4 50-89 900-979 9800-9999
-978-605 01-09 100-399 4000-5999 60000-89999 90-99
-978-606 0-0 10-49 500-799 8000-9199 92000-99999
-978-607 00-39 400-749 7500-9499 95000-99999
-978-608 0-0 10-19 200-449 4500-6499 65000-69999 7-9
-978-609 00-39 400-799 8000-9499 95000-99999
-978-612 00-29 300-399 4000-4499 45000-49999 50-99
-978-613 0-9
-978-614 00-39 400-799 8000-9499 95000-99999
-978-615 00-09 100-499 5000-7999 80000-89999
-978-616 00-19 200-699 7000-8999 90000-99999
-978-617 00-49 500-699 7000-8999 90000-99999
-978-7 00-09 100-499 5000-7999 80000-89999 900000-999999
-978-80 00-19 200-699 7000-8499 85000-89999 900000-999999
-978-81 00-19 200-699 7000-8499 85000-89999 900000-999999
-978-82 00-19 200-699 7000-8999 90000-98999 990000-999999
-978-83 00-19 200-599 60000-69999 7000-8499 85000-89999 900000-999999
-978-84 00-14 15000-19999 200-699 7000-8499 85000-89999 9000-9199
-978-84 920000-923999 92400-92999 930000-949999 95000-96999 9700-9999
-978-85 00-19 200-599 60000-69999 7000-8499 85000-89999 900000-979999
-978-85 98000-99999
-978-86 00-29 300-599 6000-7999 80000-89999 900000-999999
-978-87 00-29 400-649 7000-7999 85000-94999 970000-999999
-978-88 00-19 200-599 6000-8499 85000-89999 900000-949999 95000-99999
-978-89 00-24 250-549 5500-8499 85000-94999 950000-999999
-978-90 00-19 200-499 5000-6999 70000-79999 800000-849999 8500-8999 90-90
-978-90 910000-939999 94-94 950000-999999
-978-91 0-1 20-49 500-649 7000-7999 85000-94999 970000-999999
-978-92 0-5 60-79 800-899 9000-9499 95000-98999 990000-999999
-978-93 00-09 100-499 5000-7999 80000-94999 950000-999999
-978-94 000-599 6000-8999 90000-99999
-978-950 00-49 500-899 9000-9899 99000-99999
-978-951 0-1 20-54 550-889 8900-9499 95000-99999
-978-952 00-19 200-499 5000-5999 60-65 6600-6699 67000-69999 7000-7999 80-94
-978-952 9500-9899 99000-99999
-978-953 0-0 10-14 150-549 55000-59999 6000-9499 95000-99999
-978-954 00-28 2900-2999 300-799 8000-8999 90000-92999 9300-9999
-978-955 0000-1999 20-49 50000-54999 550-799 8000-9499 95000-99999
-978-956 00-19 200-699 7000-9999
-978-957 00-02 0300-0499 05-19 2000-2099 21-27 28000-30999 31-43 440-819
-978-957 8200-9699 97000-99999
-978-958 00-56 57000-59999 600-799 8000-9499 95000-99999
-978-959 00-19 200-699 7000-8499 85000-99999
-978-960 00-19 200-659 6600-6899 690-699 7000-8499 85000-92999 93-93 9400-9799
-978-960 98000-99999
-978-961 00-19 200-599 6000-8999 90000-94999
-978-962 00-19 200-699 7000-8499 85000-86999 8700-8999 900-999
-978-963 00-19 200-699 7000-8499 85000-89999 9000-9999
-978-964 00-14 150-249 2500-2999 300-549 5500-8999 90000-96999 970-989
-978-964 9900-9999
-978-965 00-19 200-599 7000-7999 90000-99999
-978-966 00-14 1500-1699 170-199 2000-2999 300-699 7000-8999 90000-99999
-978-967 00-00 0100-0999 10000-19999 300-499 5000-5999 60-89 900-989 9900-9989
-978-967 99900-99999
-978-968 01-39 400-499 5000-7999 800-899 9000-9999
-978-969 0-1 20-39 400-799 8000-9999
-978-970 01-59 600-899 9000-9099 91000-96999 9700-9999
-978-971 000-015 0160-0199 02-02 0300-0599 06-09 10-49 500-849 8500-9099
-978-971 91000-98999 9900-9999
-978-972 0-1 20-54 550-799 8000-9499 95000-99999
-978-973 0-0 100-169 1700-1999 20-54 550-759 7600-8499 85000-88999 8900-9499
-978-973 95000-99999
-978-974 00-19 200-699 7000-8499 85000-89999 90000-94999 9500-9999
-978-975 00000-00999 01-01 02-24 250-599 6000-9199 92000-98999 990-999
-978-976 0-3 40-59 600-799 8000-9499 95000-99999
-978-977 00-19 200-499 5000-6999 700-999
-978-978 000-199 2000-2999 30000-79999 8000-8999 900-999
-978-979 000-099 1000-1499 15000-19999 20-29 3000-3999 400-799 8000-9499
-978-979 95000-99999
-978-980 00-19 200-599 6000-9999
-978-981 00-11 1200-1999 200-289 2900-9999
-978-982 00-09 100-699 70-89 9000-9799 98000-99999
-978-983 00-01 020-199 2000-3999 40000-44999 45-49 50-79 800-899 9000-9899
-978-983 99000-99999
-978-984 00-39 400-799 8000-8999 90000-99999
-978-985 00-39 400-599 6000-8999 90000-99999
-978-986 00-11 120-559 5600-7999 80000-99999
-978-987 00-09 1000-1999 20000-29999 30-49 500-899 9000-9499 95000-99999
-978-988 00-16 17000-19999 200-799 8000-9699 97000-99999
-978-989 0-1 20-54 550-799 8000-9499 95000-99999
-978-9927 00-09 100-399 4000-4999
-978-9928 00-09 100-399 4000-4999
-978-9929 0-3 40-54 550-799 8000-9999
-978-9930 00-49 500-939 9400-9999
-978-9931 00-29 300-899 9000-9999
-978-9932 00-39 400-849 8500-9999
-978-9933 0-0 10-39 400-899 9000-9999
-978-9934 0-0 10-49 500-799 8000-9999
-978-9935 0-0 10-39 400-899 9000-9999
-978-9936 0-1 20-39 400-799 8000-9999
-978-9937 0-2 30-49 500-799 8000-9999
-978-9938 00-79 800-949 9500-9999
-978-9939 0-4 50-79 800-899 9000-9999
-978-9940 0-1 20-49 500-899 9000-9999
-978-9941 0-0 10-39 400-899 9000-9999
-978-9942 00-89 900-984 9850-9999
-978-9943 00-29 300-399 4000-9999
-978-9944 0000-0999 100-499 5000-5999 60-69 700-799 80-89 900-999
-978-9945 00-00 010-079 08-39 400-569 57-57 580-849 8500-9999
-978-9946 0-1 20-39 400-899 9000-9999
-978-9947 0-1 20-79 800-999
-978-9948 00-39 400-849 8500-9999
-978-9949 0-0 10-39 400-899 9000-9999
-978-9950 00-29 300-849 8500-9999
-978-9951 00-39 400-849 8500-9999
-978-9952 0-1 20-39 400-799 8000-9999
-978-9953 0-0 10-39 400-599 60-89 9000-9999
-978-9954 0-1 20-39 400-799 8000-9999
-978-9955 00-39 400-929 9300-9999
-978-9956 0-0 10-39 400-899 9000-9999
-978-9957 00-39 400-699 70-84 8500-8799 88-99
-978-9958 0-0 10-49 500-899 9000-9999
-978-9959 0-1 20-79 800-949 9500-9999
-978-9960 00-59 600-899 9000-9999
-978-9961 0-2 30-69 700-949 9500-9999
-978-9962 00-54 5500-5599 56-59 600-849 8500-9999
-978-9963 0-2 30-54 550-734 7350-7499 7500-9999
-978-9964 0-6 70-94 950-999
-978-9965 00-39 400-899 9000-9999
-978-9966 000-149 1500-1999 20-69 7000-7499 750-959 9600-9999
-978-9967 00-39 400-899 9000-9999
-978-9968 00-49 500-939 9400-9999
-978-9970 00-39 400-899 9000-9999
-978-9971 0-5 60-89 900-989 9900-9999
-978-9972 00-09 1-1 200-249 2500-2999 30-59 600-899 9000-9999
-978-9973 00-05 060-089 0900-0999 10-69 700-969 9700-9999
-978-9974 0-2 30-54 550-749 7500-9499 95-99
-978-9975 0-0 100-399 4000-4499 45-89 900-949 9500-9999
-978-9976 0-5 60-89 900-989 9900-9999
-978-9977 00-89 900-989 9900-9999
-978-9978 00-29 300-399 40-94 950-989 9900-9999
-978-9979 0-4 50-64 650-659 66-75 760-899 9000-9999
-978-9980 0-3 40-89 900-989 9900-9999
-978-9981 00-09 100-159 1600-1999 20-79 800-949 9500-9999
-978-9982 00-79 800-989 9900-9999
-978-9983 80-94 950-989 9900-9999
-978-9984 00-49 500-899 9000-9999
-978-9985 0-4 50-79 800-899 9000-9999
-978-9986 00-39 400-899 9000-9399 940-969 97-99
-978-9987 00-39 400-879 8800-9999
-978-9988 0-2 30-54 550-749 7500-9999
-978-9989 0-0 100-199 2000-2999 30-59 600-949 9500-9999
-978-99901 00-49 500-799 80-99
-978-99903 0-1 20-89 900-999
-978-99904 0-5 60-89 900-999
-978-99905 0-3 40-79 800-999
-978-99906 0-2 30-59 600-699 70-89 90-94 950-999
-978-99908 0-0 10-89 900-999
-978-99909 0-3 40-94 950-999
-978-99910 0-2 30-89 900-999
-978-99911 00-59 600-999
-978-99912 0-3 400-599 60-89 900-999
-978-99913 0-2 30-35 600-604
-978-99914 0-4 50-89 900-999
-978-99915 0-4 50-79 800-999
-978-99916 0-2 30-69 700-999
-978-99917 0-2 30-89 900-999
-978-99918 0-3 40-79 800-999
-978-99919 0-2 300-399 40-69 900-999
-978-99920 0-4 50-89 900-999
-978-99921 0-1 20-69 700-799 8-8 90-99
-978-99922 0-3 40-69 700-999
-978-99923 0-1 20-79 800-999
-978-99924 0-1 20-79 800-999
-978-99925 0-3 40-79 800-999
-978-99926 0-0 10-59 600-999
-978-99927 0-2 30-59 600-999
-978-99928 0-0 10-79 800-999
-978-99929 0-4 50-79 800-999
-978-99930 0-4 50-79 800-999
-978-99931 0-4 50-79 800-999
-978-99932 0-0 10-59 600-699 7-7 80-99
-978-99933 0-2 30-59 600-999
-978-99934 0-1 20-79 800-999
-978-99935 0-2 30-59 600-699 7-8 90-99
-978-99936 0-0 10-59 600-999
-978-99937 0-1 20-59 600-999
-978-99938 0-1 20-59 600-899 90-99
-978-99939 0-5 60-89 900-999
-978-99940 0-0 10-69 700-999
-978-99941 0-2 30-79 800-999
-978-99942 0-4 50-79 800-999
-978-99943 0-2 30-59 600-999
-978-99944 0-4 50-79 800-999
-978-99945 0-5 60-89 900-999
-978-99946 0-2 30-59 600-999
-978-99947 0-2 30-69 700-999
-978-99948 0-4 50-79 800-999
-978-99949 0-1 20-89 900-999
-978-99950 0-4 50-79 800-999
-978-99952 0-4 50-79 800-999
-978-99953 0-2 30-79 800-939 94-99
-978-99954 0-2 30-69 700-999
-978-99955 0-1 20-59 600-799 80-89 90-99
-978-99956 00-59 600-859 86-99
-978-99957 0-1 20-79 800-999
-978-99958 0-4 50-94 950-999
-978-99959 0-2 30-59 600-999
-978-99960 0-0 10-94 950-999
-978-99961 0-3 40-89 900-999
-978-99962 0-4 50-79 800-999
-978-99963 00-49 500-999
-978-99964 0-1 20-79 800-999
-978-99965 0-3 40-79 800-999
-978-99966 0-2 30-69 700-799
-978-99967 0-1 20-59 600-899
-979 10-10
-979-10 00-19 200-699 7000-8999 90000-97599 976000-999999
-"""
-
-def _expand():
-    """Ensures that the prefix list is expanded as a dictionary to allow
-    easy lookups. The default text form is compact but not very efficient."""
-    global _prefixes
-    if type(_prefixes) == dict:
-        return
-    # build a new dictionary of ranges from the string
-    new_prefixes = dict()
-    for line in _prefixes.splitlines():
-        if line:
-            ( prefix, r ) = line.split(' ', 1)
-            range_list = new_prefixes.setdefault(prefix, [])
-            for r in r.split(' '):
-                low, high = r.split('-')
-                range_list.append((len(low), low, high))
-    # save the dictionary
-    _prefixes = new_prefixes
-
-def lookup(prefix, number):
-    """Look up the specified prefix and split the provided number split in
-    the correct parts. If the prefix cannot be found or the number is not
-    in any of the defined ranges a tuple with one element is returned.
-    The prefix and number together are expected to form a complete ISBN13
-    number.
-
-    >>> lookup('978', '9024538270')
-    ('90', '24538270')
-    >>> lookup('978-0', '471117094')
-    ('471', '117094')
-    """
-    _expand()
-    try:
-        for length, low, high in _prefixes[prefix]:
-            if low <= number[:length] <= high:
-                return number[:length], number[length:]
-    except KeyError:
-        pass
-    return ( '', number )
-
-def load(fp):
-    """Loads the data from the specified file descriptor. The provided file
-    should match the format of the RangeMessage.xml file."""
-    # this is in-line to avoid importing xml.sax for normal use
-    import xml.sax
-    # initialise data
-    global _prefixes
-    _prefixes = dict()
-    # SAX handler class
-    class RangeHandler(xml.sax.ContentHandler):
-        def __init__(self):
-            self._gather = None
-            self._prefix = None
-            self._range = None
-            self._length = None
-        def startElement(self, name, attrs):
-            if name in ( 'MessageSerialNumber', 'MessageDate', 'Prefix',
-                         'Range', 'Length',  ):
-                self._gather = ''
-        def characters(self, content):
-            if self._gather is not None:
-                self._gather += content
-        def endElement(self, name):
-            if name == 'MessageSerialNumber':
-                global _download_serial
-                _download_serial = self._gather.strip()
-            elif name == 'MessageDate':
-                global _download_date
-                _download_date = self._gather.strip()
-            elif name == 'Prefix':
-                self._prefix = self._gather.strip()
-            elif name == 'Range':
-                self._range = self._gather.strip()
-            elif name == 'Length':
-                self._length = int(self._gather.strip())
-            elif name == 'Rule' and self._length:
-                r = ( self._length, ) + tuple( x[:self._length] for x in 
self._range.split('-') )
-                _prefixes.setdefault(self._prefix, []).append(r)
-            self._gather = None
-    # start the actual parsing
-    parser = xml.sax.make_parser()
-    parser.setContentHandler(RangeHandler())
-    parser.parse(fp)
 
-def download(url=None):
-    """Download the RangeMessage.xml data from the International ISBN Agency
-    website or from the specified URL."""
-    import urllib
-    load(urllib.urlopen(url or _download_url))
+# The place where the current version of RangeMessage.xml can be downloaded.
+download_url = 'http://www.isbn-international.org/agency?rmxml=1'
 
-def _wrap(text, max_len, sep=' '):
+def _wrap(text):
     """Generator that returns lines of text that are no longer than
-    max_len. The sep arguments is the string to split on."""
+    max_len."""
     while text:
         i = len(text)
-        if i > max_len:
-            i = text.rindex(' ', 20, max_len)
+        if i > 73:
+            i = text.rindex(',', 20, 73)
         yield text[:i]
         text = text[i+1:]
 
-def output(fp=None):
-    """Print the downloaded range data to stdout (or a file if one is
-    provided) in the compact text format suitable for inclusion in this
-    module."""
-    _expand()
-    if not fp:
-        import sys
-        fp = sys.stdout
-    # first print the header if we can
-    try:
-        fp.write('# generated from RangeMessage.xml, downloaded from\n'
-                 '# %(url)s\n'
-                 '# serial %(serial)s\n'
-                 '# date %(date)s\n'
-                 '_prefixes = """\n' % { 'url':    _download_url,
-                                         'serial': _download_serial,
-                                         'date':   _download_date })
-        headerprinted = True
-    except NameError:
-        headerprinted = False
-    # print the actual prefixes
-    prefixes = _prefixes.items()
-    prefixes.sort()
-    for prefix, ranges in prefixes:
-        for line in _wrap(' '.join(r[1] + '-' + r[2] for r in ranges), 77 - 
len(prefix)):
-            fp.write('%s %s\n' % ( prefix, line ) )
-    # print the footer if the header was printed
-    if headerprinted:
-        fp.write('"""\n')
+
+class RangeHandler(xml.sax.ContentHandler):
+
+    def __init__(self):
+        self._gather = None
+        self._prefix = None
+        self._agency = None
+        self._range = None
+        self._length = None
+        self._ranges = []
+        self._last = None
+        self._topranges = {}
+
+    def startElement(self, name, attrs):
+        if name in ( 'MessageSerialNumber', 'MessageDate', 'Prefix',
+                     'Agency', 'Range', 'Length',  ):
+            self._gather = ''
+
+    def characters(self, content):
+        if self._gather is not None:
+            self._gather += content
+
+    def endElement(self, name):
+        if name == 'MessageSerialNumber':
+            print '# file serial %s' % self._gather.strip()
+        elif name == 'MessageDate':
+            print '# file date %s' % self._gather.strip()
+        elif name == 'Prefix':
+            self._prefix = self._gather.strip()
+        elif name == 'Agency':
+            self._agency = self._gather.strip()
+        elif name == 'Range':
+            self._range = self._gather.strip()
+        elif name == 'Length':
+            self._length = int(self._gather.strip())
+        elif name == 'Rule' and self._length:
+            self._ranges.append(tuple( x[:self._length] for x in 
self._range.split('-') ))
+        elif name == 'Rules':
+            if '-' in self._prefix:
+                p, a = self._prefix.split('-')
+                if p != self._last:
+                    print p
+                    self._last = p
+                    for line in _wrap(','.join(r[0] + '-' + r[1] for r in 
self._topranges[p])):
+                        print ' %s' % ( line )
+                print ' %s agency="%s"' % ( a, self._agency )
+                for line in _wrap(','.join(r[0] + '-' + r[1] for r in 
self._ranges)):
+                    print '  %s' % ( line )
+            else:
+                self._topranges[self._prefix] = self._ranges
+            self._ranges = []
+        self._gather = None
+
+
+if __name__ == '__main__':
+    print '# generated from RangeMessage.xml, downloaded from'
+    print '# %s' % download_url
+    parser = xml.sax.make_parser()
+    parser.setContentHandler(RangeHandler())
+    parser.parse(urllib.urlopen(download_url))
+    #parser.parse('RangeMessage.xml')

Added: python-stdnum/stdnum/isbn.dat
==============================================================================
--- /dev/null   00:00:00 1970   (empty, because file is newly added)
+++ python-stdnum/stdnum/isbn.dat       Wed Nov 24 23:09:28 2010        (r42)
@@ -0,0 +1,436 @@
+# generated from RangeMessage.xml, downloaded from
+# http://www.isbn-international.org/agency?rmxml=1
+# file serial 7a5c26a5-62dc-463d-adab-2e214f2a1316
+# file date Tue, 16 Nov 2010 10:49:43 GMT
+978
+ 0-5,600-649,7-7,80-94,950-989,9900-9989,99900-99999
+ 0 agency="English language"
+  00-19,200-699,7000-8499,85000-89999,900000-949999,9500000-9999999
+ 1 agency="English language"
+  00-09,100-399,4000-5499,55000-86979,869800-998999,9990000-9999999
+ 2 agency="French language"
+  00-19,200-349,35000-39999,400-699,7000-8399,84000-89999,900000-949999
+  9500000-9999999
+ 3 agency="German language"
+  00-02,030-033,0340-0369,03700-03999,04-19,200-699,7000-8499,85000-89999
+  900000-949999,9500000-9539999,95400-96999,9700000-9899999,99000-99499
+  99500-99999
+ 4 agency="Japan"
+  00-19,200-699,7000-8499,85000-89999,900000-949999,9500000-9999999
+ 5 agency="Russian Federation and former USSR"
+  00-19,200-420,4210-4299,430-430,4310-4399,440-440,4410-4499,450-699
+  7000-8499,85000-89999,900000-909999,91000-91999,9200-9299,93000-94999
+  9500000-9500999,9501-9799,98000-98999,9900000-9909999,9910-9999
+ 600 agency="Iran"
+  00-09,100-499,5000-8999,90000-99999
+ 601 agency="Kazakhstan"
+  00-19,200-699,7000-7999,80000-84999,85-99
+ 602 agency="Indonesia"
+  00-19,200-799,8000-9499,95000-99999
+ 603 agency="Saudi Arabia"
+  00-04,05-49,500-799,8000-8999,90000-99999
+ 604 agency="Vietnam"
+  0-4,50-89,900-979,9800-9999
+ 605 agency="Turkey"
+  01-09,100-399,4000-5999,60000-89999,90-99
+ 606 agency="Romania"
+  0-0,10-49,500-799,8000-9199,92000-99999
+ 607 agency="Mexico"
+  00-39,400-749,7500-9499,95000-99999
+ 608 agency="Macedonia"
+  0-0,10-19,200-449,4500-6499,65000-69999,7-9
+ 609 agency="Lithuania"
+  00-39,400-799,8000-9499,95000-99999
+ 611 agency="Thailand"
+ 612 agency="Peru"
+  00-29,300-399,4000-4499,45000-49999,50-99
+ 613 agency="Mauritius"
+  0-9
+ 614 agency="Lebanon"
+  00-39,400-799,8000-9499,95000-99999
+ 615 agency="Hungary"
+  00-09,100-499,5000-7999,80000-89999
+ 616 agency="Thailand"
+  00-19,200-699,7000-8999,90000-99999
+ 617 agency="Ukraine"
+  00-49,500-699,7000-8999,90000-99999
+ 7 agency="China, People's Republic"
+  00-09,100-499,5000-7999,80000-89999,900000-999999
+ 80 agency="Czech Republic and Slovakia"
+  00-19,200-699,7000-8499,85000-89999,900000-999999
+ 81 agency="India"
+  00-19,200-699,7000-8499,85000-89999,900000-999999
+ 82 agency="Norway"
+  00-19,200-699,7000-8999,90000-98999,990000-999999
+ 83 agency="Poland"
+  00-19,200-599,60000-69999,7000-8499,85000-89999,900000-999999
+ 84 agency="Spain"
+  00-14,15000-19999,200-699,7000-8499,85000-89999,9000-9199,920000-923999
+  92400-92999,930000-949999,95000-96999,9700-9999
+ 85 agency="Brazil"
+  00-19,200-599,60000-69999,7000-8499,85000-89999,900000-979999,98000-99999
+ 86 agency="Serbia and Montenegro"
+  00-29,300-599,6000-7999,80000-89999,900000-999999
+ 87 agency="Denmark"
+  00-29,400-649,7000-7999,85000-94999,970000-999999
+ 88 agency="Italy"
+  00-19,200-599,6000-8499,85000-89999,900000-949999,95000-99999
+ 89 agency="Korea, Republic"
+  00-24,250-549,5500-8499,85000-94999,950000-969999,97000-98999,990-999
+ 90 agency="Netherlands"
+  00-19,200-499,5000-6999,70000-79999,800000-849999,8500-8999,90-90
+  910000-939999,94-94,950000-999999
+ 91 agency="Sweden"
+  0-1,20-49,500-649,7000-7999,85000-94999,970000-999999
+ 92 agency="International NGO Publishers and EC Organizations"
+  0-5,60-79,800-899,9000-9499,95000-98999,990000-999999
+ 93 agency="India"
+  00-09,100-499,5000-7999,80000-94999,950000-999999
+ 94 agency="Netherlands"
+  000-599,6000-8999,90000-99999
+ 950 agency="Argentina"
+  00-49,500-899,9000-9899,99000-99999
+ 951 agency="Finland"
+  0-1,20-54,550-889,8900-9499,95000-99999
+ 952 agency="Finland"
+  00-19,200-499,5000-5999,60-65,6600-6699,67000-69999,7000-7999,80-94
+  9500-9899,99000-99999
+ 953 agency="Croatia"
+  0-0,10-14,150-549,55000-59999,6000-9499,95000-99999
+ 954 agency="Bulgaria"
+  00-28,2900-2999,300-799,8000-8999,90000-92999,9300-9999
+ 955 agency="Sri Lanka"
+  0000-1999,20-49,50000-54999,550-799,8000-9499,95000-99999
+ 956 agency="Chile"
+  00-19,200-699,7000-9999
+ 957 agency="Taiwan"
+  00-02,0300-0499,05-19,2000-2099,21-27,28000-30999,31-43,440-819
+  8200-9699,97000-99999
+ 958 agency="Colombia"
+  00-56,57000-59999,600-799,8000-9499,95000-99999
+ 959 agency="Cuba"
+  00-19,200-699,7000-8499,85000-99999
+ 960 agency="Greece"
+  00-19,200-659,6600-6899,690-699,7000-8499,85000-92999,93-93,9400-9799
+  98000-99999
+ 961 agency="Slovenia"
+  00-19,200-599,6000-8999,90000-94999
+ 962 agency="Hong Kong, China"
+  00-19,200-699,7000-8499,85000-86999,8700-8999,900-999
+ 963 agency="Hungary"
+  00-19,200-699,7000-8499,85000-89999,9000-9999
+ 964 agency="Iran"
+  00-14,150-249,2500-2999,300-549,5500-8999,90000-96999,970-989,9900-9999
+ 965 agency="Israel"
+  00-19,200-599,7000-7999,90000-99999
+ 966 agency="Ukraine"
+  00-14,1500-1699,170-199,2000-2999,300-699,7000-8999,90000-99999
+ 967 agency="Malaysia"
+  00-00,0100-0999,10000-19999,300-499,5000-5999,60-89,900-989,9900-9989
+  99900-99999
+ 968 agency="Mexico"
+  01-39,400-499,5000-7999,800-899,9000-9999
+ 969 agency="Pakistan"
+  0-1,20-39,400-799,8000-9999
+ 970 agency="Mexico"
+  01-59,600-899,9000-9099,91000-96999,9700-9999
+ 971 agency="Philippines"
+  000-015,0160-0199,02-02,0300-0599,06-09,10-49,500-849,8500-9099
+  91000-98999,9900-9999
+ 972 agency="Portugal"
+  0-1,20-54,550-799,8000-9499,95000-99999
+ 973 agency="Romania"
+  0-0,100-169,1700-1999,20-54,550-759,7600-8499,85000-88999,8900-9499
+  95000-99999
+ 974 agency="Thailand"
+  00-19,200-699,7000-8499,85000-89999,90000-94999,9500-9999
+ 975 agency="Turkey"
+  00000-01999,02-24,250-599,6000-9199,92000-98999,990-999
+ 976 agency="Caribbean Community"
+  0-3,40-59,600-799,8000-9499,95000-99999
+ 977 agency="Egypt"
+  00-19,200-499,5000-6999,700-999
+ 978 agency="Nigeria"
+  000-199,2000-2999,30000-79999,8000-8999,900-999
+ 979 agency="Indonesia"
+  000-099,1000-1499,15000-19999,20-29,3000-3999,400-799,8000-9499
+  95000-99999
+ 980 agency="Venezuela"
+  00-19,200-599,6000-9999
+ 981 agency="Singapore"
+  00-11,1200-1999,200-289,2900-9999
+ 982 agency="South Pacific"
+  00-09,100-699,70-89,9000-9799,98000-99999
+ 983 agency="Malaysia"
+  00-01,020-199,2000-3999,40000-44999,45-49,50-79,800-899,9000-9899
+  99000-99999
+ 984 agency="Bangladesh"
+  00-39,400-799,8000-8999,90000-99999
+ 985 agency="Belarus"
+  00-39,400-599,6000-8999,90000-99999
+ 986 agency="Taiwan"
+  00-11,120-559,5600-7999,80000-99999
+ 987 agency="Argentina"
+  00-09,1000-1999,20000-29999,30-49,500-899,9000-9499,95000-99999
+ 988 agency="Hong Kong, China"
+  00-16,17000-19999,200-799,8000-9699,97000-99999
+ 989 agency="Portugal"
+  0-1,20-54,550-799,8000-9499,95000-99999
+ 9927 agency="Qatar"
+  00-09,100-399,4000-4999
+ 9928 agency="Albania"
+  00-09,100-399,4000-4999
+ 9929 agency="Guatemala"
+  0-3,40-54,550-799,8000-9999
+ 9930 agency="Costa Rica"
+  00-49,500-939,9400-9999
+ 9931 agency="Algeria"
+  00-29,300-899,9000-9999
+ 9932 agency="Lao People's Democratic Republic"
+  00-39,400-849,8500-9999
+ 9933 agency="Syria"
+  0-0,10-39,400-899,9000-9999
+ 9934 agency="Latvia"
+  0-0,10-49,500-799,8000-9999
+ 9935 agency="Iceland"
+  0-0,10-39,400-899,9000-9999
+ 9936 agency="Afghanistan"
+  0-1,20-39,400-799,8000-9999
+ 9937 agency="Nepal"
+  0-2,30-49,500-799,8000-9999
+ 9938 agency="Tunisia"
+  00-79,800-949,9500-9999
+ 9939 agency="Armenia"
+  0-4,50-79,800-899,9000-9999
+ 9940 agency="Montenegro"
+  0-1,20-49,500-899,9000-9999
+ 9941 agency="Georgia"
+  0-0,10-39,400-899,9000-9999
+ 9942 agency="Ecuador"
+  00-89,900-984,9850-9999
+ 9943 agency="Uzbekistan"
+  00-29,300-399,4000-9999
+ 9944 agency="Turkey"
+  0000-0999,100-499,5000-5999,60-69,700-799,80-89,900-999
+ 9945 agency="Dominican Republic"
+  00-00,010-079,08-39,400-569,57-57,580-849,8500-9999
+ 9946 agency="Korea, P.D.R."
+  0-1,20-39,400-899,9000-9999
+ 9947 agency="Algeria"
+  0-1,20-79,800-999
+ 9948 agency="United Arab Emirates"
+  00-39,400-849,8500-9999
+ 9949 agency="Estonia"
+  0-0,10-39,400-899,9000-9999
+ 9950 agency="Palestine"
+  00-29,300-849,8500-9999
+ 9951 agency="Kosova"
+  00-39,400-849,8500-9999
+ 9952 agency="Azerbaijan"
+  0-1,20-39,400-799,8000-9999
+ 9953 agency="Lebanon"
+  0-0,10-39,400-599,60-89,9000-9999
+ 9954 agency="Morocco"
+  0-1,20-39,400-799,8000-9999
+ 9955 agency="Lithuania"
+  00-39,400-929,9300-9999
+ 9956 agency="Cameroon"
+  0-0,10-39,400-899,9000-9999
+ 9957 agency="Jordan"
+  00-39,400-699,70-84,8500-8799,88-99
+ 9958 agency="Bosnia and Herzegovina"
+  0-0,10-18,1900-1999,20-49,500-899,9000-9999
+ 9959 agency="Libya"
+  0-1,20-79,800-949,9500-9999
+ 9960 agency="Saudi Arabia"
+  00-59,600-899,9000-9999
+ 9961 agency="Algeria"
+  0-2,30-69,700-949,9500-9999
+ 9962 agency="Panama"
+  00-54,5500-5599,56-59,600-849,8500-9999
+ 9963 agency="Cyprus"
+  0-2,30-54,550-734,7350-7499,7500-9999
+ 9964 agency="Ghana"
+  0-6,70-94,950-999
+ 9965 agency="Kazakhstan"
+  00-39,400-899,9000-9999
+ 9966 agency="Kenya"
+  000-149,1500-1999,20-69,7000-7499,750-959,9600-9999
+ 9967 agency="Kyrgyz Republic"
+  00-39,400-899,9000-9999
+ 9968 agency="Costa Rica"
+  00-49,500-939,9400-9999
+ 9970 agency="Uganda"
+  00-39,400-899,9000-9999
+ 9971 agency="Singapore"
+  0-5,60-89,900-989,9900-9999
+ 9972 agency="Peru"
+  00-09,1-1,200-249,2500-2999,30-59,600-899,9000-9999
+ 9973 agency="Tunisia"
+  00-05,060-089,0900-0999,10-69,700-969,9700-9999
+ 9974 agency="Uruguay"
+  0-2,30-54,550-749,7500-9499,95-99
+ 9975 agency="Moldova"
+  0-0,100-399,4000-4499,45-89,900-949,9500-9999
+ 9976 agency="Tanzania"
+  0-5,60-89,900-989,9900-9999
+ 9977 agency="Costa Rica"
+  00-89,900-989,9900-9999
+ 9978 agency="Ecuador"
+  00-29,300-399,40-94,950-989,9900-9999
+ 9979 agency="Iceland"
+  0-4,50-64,650-659,66-75,760-899,9000-9999
+ 9980 agency="Papua New Guinea"
+  0-3,40-89,900-989,9900-9999
+ 9981 agency="Morocco"
+  00-09,100-159,1600-1999,20-79,800-949,9500-9999
+ 9982 agency="Zambia"
+  00-79,800-989,9900-9999
+ 9983 agency="Gambia"
+  80-94,950-989,9900-9999
+ 9984 agency="Latvia"
+  00-49,500-899,9000-9999
+ 9985 agency="Estonia"
+  0-4,50-79,800-899,9000-9999
+ 9986 agency="Lithuania"
+  00-39,400-899,9000-9399,940-969,97-99
+ 9987 agency="Tanzania"
+  00-39,400-879,8800-9999
+ 9988 agency="Ghana"
+  0-2,30-54,550-749,7500-9999
+ 9989 agency="Macedonia"
+  0-0,100-199,2000-2999,30-59,600-949,9500-9999
+ 99901 agency="Bahrain"
+  00-49,500-799,80-99
+ 99902 agency="Gabon"
+ 99903 agency="Mauritius"
+  0-1,20-89,900-999
+ 99904 agency="Netherlands Antilles and Aruba"
+  0-5,60-89,900-999
+ 99905 agency="Bolivia"
+  0-3,40-79,800-999
+ 99906 agency="Kuwait"
+  0-2,30-59,600-699,70-89,90-94,950-999
+ 99908 agency="Malawi"
+  0-0,10-89,900-999
+ 99909 agency="Malta"
+  0-3,40-94,950-999
+ 99910 agency="Sierra Leone"
+  0-2,30-89,900-999
+ 99911 agency="Lesotho"
+  00-59,600-999
+ 99912 agency="Botswana"
+  0-3,400-599,60-89,900-999
+ 99913 agency="Andorra"
+  0-2,30-35,600-604
+ 99914 agency="Suriname"
+  0-4,50-89,900-999
+ 99915 agency="Maldives"
+  0-4,50-79,800-999
+ 99916 agency="Namibia"
+  0-2,30-69,700-999
+ 99917 agency="Brunei Darussalam"
+  0-2,30-89,900-999
+ 99918 agency="Faroe Islands"
+  0-3,40-79,800-999
+ 99919 agency="Benin"
+  0-2,300-399,40-69,900-999
+ 99920 agency="Andorra"
+  0-4,50-89,900-999
+ 99921 agency="Qatar"
+  0-1,20-69,700-799,8-8,90-99
+ 99922 agency="Guatemala"
+  0-3,40-69,700-999
+ 99923 agency="El Salvador"
+  0-1,20-79,800-999
+ 99924 agency="Nicaragua"
+  0-1,20-79,800-999
+ 99925 agency="Paraguay"
+  0-3,40-79,800-999
+ 99926 agency="Honduras"
+  0-0,10-59,600-999
+ 99927 agency="Albania"
+  0-2,30-59,600-999
+ 99928 agency="Georgia"
+  0-0,10-79,800-999
+ 99929 agency="Mongolia"
+  0-4,50-79,800-999
+ 99930 agency="Armenia"
+  0-4,50-79,800-999
+ 99931 agency="Seychelles"
+  0-4,50-79,800-999
+ 99932 agency="Malta"
+  0-0,10-59,600-699,7-7,80-99
+ 99933 agency="Nepal"
+  0-2,30-59,600-999
+ 99934 agency="Dominican Republic"
+  0-1,20-79,800-999
+ 99935 agency="Haiti"
+  0-2,30-59,600-699,7-8,90-99
+ 99936 agency="Bhutan"
+  0-0,10-59,600-999
+ 99937 agency="Macau"
+  0-1,20-59,600-999
+ 99938 agency="Srpska, Republic of"
+  0-1,20-59,600-899,90-99
+ 99939 agency="Guatemala"
+  0-5,60-89,900-999
+ 99940 agency="Georgia"
+  0-0,10-69,700-999
+ 99941 agency="Armenia"
+  0-2,30-79,800-999
+ 99942 agency="Sudan"
+  0-4,50-79,800-999
+ 99943 agency="Albania"
+  0-2,30-59,600-999
+ 99944 agency="Ethiopia"
+  0-4,50-79,800-999
+ 99945 agency="Namibia"
+  0-5,60-89,900-999
+ 99946 agency="Nepal"
+  0-2,30-59,600-999
+ 99947 agency="Tajikistan"
+  0-2,30-69,700-999
+ 99948 agency="Eritrea"
+  0-4,50-79,800-999
+ 99949 agency="Mauritius"
+  0-1,20-89,900-999
+ 99950 agency="Cambodia"
+  0-4,50-79,800-999
+ 99951 agency="Congo"
+ 99952 agency="Mali"
+  0-4,50-79,800-999
+ 99953 agency="Paraguay"
+  0-2,30-79,800-939,94-99
+ 99954 agency="Bolivia"
+  0-2,30-69,700-999
+ 99955 agency="Srpska, Republic of"
+  0-1,20-59,600-799,80-89,90-99
+ 99956 agency="Albania"
+  00-59,600-859,86-99
+ 99957 agency="Malta"
+  0-1,20-79,800-999
+ 99958 agency="Bahrain"
+  0-4,50-94,950-999
+ 99959 agency="Luxembourg"
+  0-2,30-59,600-999
+ 99960 agency="Malawi"
+  0-0,10-94,950-999
+ 99961 agency="El Salvador"
+  0-3,40-89,900-999
+ 99962 agency="Mongolia"
+  0-4,50-79,800-999
+ 99963 agency="Cambodia"
+  00-49,500-999
+ 99964 agency="Nicaragua"
+  0-1,20-79,800-999
+ 99965 agency="Macau"
+  0-3,40-79,800-999
+ 99966 agency="Kuwait"
+  0-2,30-69,700-799
+ 99967 agency="Paraguay"
+  0-1,20-59,600-899
+979
+ 10-10
+ 10 agency="France"
+  00-19,200-699,7000-8999,90000-97599,976000-999999

Copied and modified: python-stdnum/stdnum/isbn.py (from r41, 
python-stdnum/stdnum/isbn/__init__.py)
==============================================================================
--- python-stdnum/stdnum/isbn/__init__.py       Sat Sep 11 11:13:47 2010        
(r41, copy source)
+++ python-stdnum/stdnum/isbn.py        Wed Nov 24 23:09:28 2010        (r42)
@@ -108,7 +108,7 @@
     """Split the specified ISBN into an EAN.UCC prefix, a group prefix, a
     registrant, an item number and a check-digit. If the number is in ISBN10
     format the returned EAN.UCC prefix is '978'."""
-    import ranges
+    from stdnum import numdb
     # clean up number
     number = compact(number)
     # get Bookland prefix if any
@@ -118,12 +118,13 @@
     else:
         oprefix = prefix = number[:3]
         number = number[3:]
-    # get group
-    group, number = ranges.lookup(prefix, number)
-    publisher, number = ranges.lookup('%s-%s' % (prefix, group), number)
-    itemnr = number[:-1]
-    check = number[-1]
-    return ( oprefix, group, publisher, itemnr, check )
+    # split the number
+    result = numdb.get('isbn').split(prefix+number[:-1])[1:]
+    itemnr = result.pop()
+    group = result.pop(0) if result else ''
+    publisher = result.pop(0) if result else ''
+    # return results
+    return ( oprefix, group, publisher, itemnr, number[-1] )
 
 def format(number, separator='-'):
     """Reformat the passed number to the standard format with the EAN.UCC

Added: python-stdnum/stdnum/numdb.py
==============================================================================
--- /dev/null   00:00:00 1970   (empty, because file is newly added)
+++ python-stdnum/stdnum/numdb.py       Wed Nov 24 23:09:28 2010        (r42)
@@ -0,0 +1,160 @@
+
+# numdb.py - module for handling hierarchically organised numbers
+#
+# Copyright (C) 2010 Arthur de Jong
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+# 02110-1301 USA
+
+"""This module contains functions for reading and querying a database for
+storing numbers that use a hierarchical format (e.g. ISBN, IBAN, phone
+numbers, etc).
+
+To read a database from a file:
+
+>>> dbfile = read(open('test.dat', 'r'))
+
+To split a number:
+
+>>> dbfile.split('01006')
+['0', '100', '6']
+>>> dbfile.split('902006')
+['90', '20', '06']
+>>> dbfile.split('909856')
+['90', '985', '6']
+
+To split the number and get properties for each part:
+
+>>> dbfile.info('01006')
+[('0', {'prop1': 'foo'}), ('100', {'prop2': 'bar'}), ('6', {})]
+>>> dbfile.info('02006')
+[('0', {'prop1': 'foo'}), ('200', {'prop2': 'bar', 'prop3': 'baz'}), ('6', {})]
+>>> dbfile.info('03456')
+[('0', {'prop1': 'foo'}), ('345', {'prop2': 'bar', 'prop3': 'baz'}), ('6', {})]
+>>> dbfile.info('902006')
+[('90', {'prop1': 'booz'}), ('20', {'prop2': 'foo'}), ('06', {})]
+>>> dbfile.info('909856')
+[('90', {'prop1': 'booz'}), ('985', {'prop2': 'fooz'}), ('6', {})]
+>>> dbfile.info('9889')
+[('98', {'prop1': 'booz'}), ('89', {'prop2': 'foo'})]
+"""
+
+import re
+from pkg_resources import resource_stream
+
+_line_re = re.compile('^(?P<indent> 
*)(?P<ranges>([0-9a-zA-Z]+(-[0-9a-zA-Z]+)?)(,[0-9a-zA-Z]+(-[0-9a-zA-Z]+)?)*) 
*(?P<props>.*)$')
+_prop_re = re.compile('(?P<prop>[0-9a-zA-Z-_]+)="(?P<value>[^"]*)"')
+
+# this is a cache of open databases
+_open_databases = {}
+
+# the prefixes attribute of NumDB is structured as follows:
+# prefixes = [
+#   [ length, low, high, props, children ]
+#   ...
+# ]
+# where children is a prefixes structure in it's own right
+# (there is no expected ordering within the list)
+
+
+class NumDB(object):
+
+    def __init__(self):
+        self.prefixes = []
+
+    @staticmethod
+    def _merge(results):
+        """Merge the provided list of possible results into a single result
+        list (this is a generator)."""
+        results.append([])
+        for parts in map(None, *results):
+            # regroup parts into parts list and properties list
+            partlist, proplist = zip(*(x for x in parts if x))
+            part = min(partlist, key=len)
+            props = {}
+            for p in proplist:
+                props.update(p)
+            yield part, props
+
+    @staticmethod
+    def _find(number, prefixes):
+        """Lookup the specified number in the list of prefixes, this will
+        return basically what info() should return but works recursively."""
+        if not number:
+            return []
+        results = []
+        if prefixes:
+            for length, low, high, props, children in prefixes:
+                if low <= number[:length] <= high:
+                    results.append([ (number[:length], props) ] +
+                                   NumDB._find(number[length:], children))
+        # not-found fallback
+        if not results:
+            return [ ( number, {} ) ]
+        # merge the results into a single result
+        return list(NumDB._merge(results))
+
+    def info(self, number):
+        """Split the provided number in components and associate properties
+        with each component. This returns a tuple of tuples. Each tuple
+        consists of a string (a part of the number) and a dict of properties.
+        """
+        return NumDB._find(number, self.prefixes)
+
+    def split(self, number):
+        """Split the provided number in components. This returns a tuple with
+        the number of components identified."""
+        return [part for part, props in self.info(number)]
+
+
+def _parse(fp):
+    """Read lines of text from the file pointer and generate indent, length,
+    low, high, properties tuples."""
+    for line in fp.xreadlines():
+        # ignore comments
+        if line[0] == '#' or line.strip() == '':
+            continue
+        # any other line should parse
+        match = _line_re.search(line)
+        indent = len(match.group('indent'))
+        ranges = match.group('ranges')
+        props = dict(_prop_re.findall(match.group('props')))
+        for rnge in ranges.split(','):
+            if '-' in rnge:
+                low, high = rnge.split('-')
+            else:
+                low, high = rnge, rnge
+            yield ( indent, len(low), low, high, props )
+
+def read(fp):
+    """Return a new database with the data read from the specified file."""
+    last_indent = 0
+    db = NumDB()
+    stack = { 0: db.prefixes }
+    for indent, length, low, high, props in _parse(fp):
+        if indent > last_indent:
+            # populate the children field of the last indent
+            if stack[last_indent][-1][4] is None:
+                stack[last_indent][-1][4] = []
+            stack[indent] = stack[last_indent][-1][4]
+        stack[indent].append([length, low, high, props, None])
+        last_indent = indent
+    return db
+
+def get(name):
+    """Opens a database with the specified name to perform queries on."""
+    if name not in _open_databases:
+        _open_databases[name] = read(resource_stream(__name__, name + '.dat'))
+    return _open_databases[name]

Added: python-stdnum/test.dat
==============================================================================
--- /dev/null   00:00:00 1970   (empty, because file is newly added)
+++ python-stdnum/test.dat      Wed Nov 24 23:09:28 2010        (r42)
@@ -0,0 +1,7 @@
+# this is a comment line
+0-8 prop1="foo"
+  100-999 prop2="bar"
+  200,300-399 prop3="baz"
+90-99 prop1="booz"
+  00-89 prop2="foo"
+  900-999 prop2="fooz"

Modified: python-stdnum/tests/test_isbn.doctest
==============================================================================
--- python-stdnum/tests/test_isbn.doctest       Sat Sep 11 11:13:47 2010        
(r41)
+++ python-stdnum/tests/test_isbn.doctest       Wed Nov 24 23:09:28 2010        
(r42)
@@ -77,82 +77,3 @@
 ('', '99996', '', '7827', '0')
 >>> isbn.split('979-20-1234567-8')
 ('979', '', '', '201234567', '8')
-
-
-Some tests for the ranges module. This is more an internal module so
-tests here are not very critical.
-
->>> from stdnum.isbn import ranges
->>> list(ranges._wrap(2 * 'abc def ghijklmn opqr stuvwx yz', 40))[0]
-'abc def ghijklmn opqr stuvwx yzabc def'
-
-
-Test output function. Bit of a limited test but we see if the serialised
-form of the prefix/ranges list contains at least the same prefixes as the
-current _prefixes list.
-
->>> import StringIO
->>> output = StringIO.StringIO()
->>> ranges.output(output)
->>> k = set( x.split(' ')[0] for x in 
StringIO.StringIO(output.getvalue()).readlines() )
->>> k == set(ranges._prefixes.keys())
-True
-
-
-Make an XML file with somre prefix definitions and load that into the
-ranges module.
-
-First save the current ranges so we can restore later.
-
->>> save_prefixes = ranges._prefixes
-
-Write the XML to a file.
-
->>> import tempfile
->>> xmlfile = tempfile.NamedTemporaryFile(delete=False)
->>> xmlfile.write("""<?xml version='1.0' encoding='utf-8'?>
-... <ISBNRangeMessage>
-... <MessageSerialNumber>0aad2b046ddd9b30e080cb2b24afc868</MessageSerialNumber>
-... <MessageDate>Thu, 20 May 2010 18:36:55 GMT</MessageDate>
-...  <EAN.UCCPrefixes><EAN.UCC>
-...   <Prefix>978</Prefix>
-...   <Rules>
-...    <Rule><Range>0000000-5999999</Range><Length>1</Length></Rule>
-...    <Rule><Range>6000000-6499999</Range><Length>3</Length></Rule>
-...    <Rule><Range>6500000-6999999</Range><Length>0</Length></Rule>
-...   </Rules>
-...  </EAN.UCC></EAN.UCCPrefixes>
-...  <RegistrationGroups>
-...   <Group>
-...    <Prefix>978-0</Prefix>
-...    <Rules>
-...     <Rule><Range>0000000-1999999</Range><Length>2</Length></Rule>
-...     <Rule><Range>2000000-6999999</Range><Length>3</Length></Rule>
-...    </Rules>
-...   </Group>
-...  </RegistrationGroups>
-... </ISBNRangeMessage>
-... """)
->>> xmlfile.close()
-
-Load the XML file by URL and output it to another string. Check if the
-content of the XML has been
-
->>> import urllib
->>> ranges.download('file://' + urllib.pathname2url(xmlfile.name))
->>> import sys
->>> ranges.output()
-# generated from RangeMessage.xml, downloaded from
-# http://www.isbn-international.org/agency?rmxml=1
-# serial 0aad2b046ddd9b30e080cb2b24afc868
-# date Thu, 20 May 2010 18:36:55 GMT
-_prefixes = """
-978 0-5 600-649
-978-0 00-19 200-699
-"""
-
-Restore the original ranges and clean up.
-
->>> ranges._prefixes = save_prefixes
->>> import os
->>> os.unlink(xmlfile.name)
--
To unsubscribe send an email to
python-stdnum-commits-unsubscribe@lists.arthurdejong.org or see
http://lists.arthurdejong.org/python-stdnum-commits