python-stdnum commit: r42 - in python-stdnum: . stdnum stdnum/isbn tests
[
Date Prev][
Date Next]
[
Thread Prev][
Thread Next]
python-stdnum commit: r42 - in python-stdnum: . stdnum stdnum/isbn tests
- From: Commits of the python-stdnum project <python-stdnum-commits [at] lists.arthurdejong.org>
- To: python-stdnum-commits [at] lists.arthurdejong.org
- Reply-to: python-stdnum-users [at] lists.arthurdejong.org
- Subject: python-stdnum commit: r42 - in python-stdnum: . stdnum stdnum/isbn tests
- Date: Wed, 24 Nov 2010 23:09:30 +0100 (CET)
Author: arthur
Date: Wed Nov 24 23:09:28 2010
New Revision: 42
URL: http://arthurdejong.org/viewvc/python-stdnum?view=rev&revision=42
Log:
implement a new numdb module to hold information on hierarchically organised
numbers and switch the isbn module to use this format instead
Added:
python-stdnum/getisbn.py (contents, props changed)
- copied, changed from r41, python-stdnum/stdnum/isbn/ranges.py
python-stdnum/stdnum/isbn.dat
python-stdnum/stdnum/isbn.py
- copied, changed from r41, python-stdnum/stdnum/isbn/__init__.py
python-stdnum/stdnum/numdb.py
python-stdnum/test.dat
Deleted:
python-stdnum/stdnum/isbn/
Modified:
python-stdnum/tests/test_isbn.doctest
Copied and modified: python-stdnum/getisbn.py (from r41,
python-stdnum/stdnum/isbn/ranges.py)
==============================================================================
--- python-stdnum/stdnum/isbn/ranges.py Sat Sep 11 11:13:47 2010 (r41,
copy source)
+++ python-stdnum/getisbn.py Wed Nov 24 23:09:28 2010 (r42)
@@ -1,4 +1,6 @@
-# ranges.py - list of ISBN prefix data and utility functions
+#!/usr/bin/env python
+
+# getisbn.py - script to get ISBN prefix data
#
# Copyright (C) 2010 Arthur de Jong
#
@@ -17,374 +19,87 @@
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
# 02110-1301 USA
-"""This module contains that current ISBN group and registrant prefixes as
-they are registered with the International ISBN Agency. This information
-is needed to correctly split an ISBN into an EAN.UCC prefix, a group prefix,
-a registrant, an item number and a check-digit."""
+"""This script downloads XML data from the International ISBN Agency
+website and provides a compact form of all group prefixes, and registrant
+ranges for those prefixes suitable for the numdb module. This data is needed
+to correctly split ISBNs into an EAN.UCC prefix, a group prefix, a registrant,
+an item number and a check-digit."""
-# The place where the current version of RangeMessage.xml can be downloaded.
-_download_url = 'http://www.isbn-international.org/agency?rmxml=1'
+import xml.sax
+import urllib
-# What follows is a representation of the prefixes that are defined by
-# International ISBN Agency to correctly split ISBNs. See the download()
-# and output() methods on how to download and generate this data.
-
-# generated from RangeMessage.xml, downloaded from
-# http://www.isbn-international.org/agency?rmxml=1
-# serial 0aad2b046ddd9b30e080cb2b24afc868
-# date Thu, 20 May 2010 18:36:55 GMT
-_prefixes = """
-978 0-5 600-649 7-7 80-94 950-989 9900-9989 99900-99999
-978-0 00-19 200-699 7000-8499 85000-89999 900000-949999 9500000-9999999
-978-1 00-09 100-399 4000-5499 55000-86979 869800-998999 9990000-9999999
-978-2 00-19 200-349 35000-39999 400-699 7000-8399 84000-89999 900000-949999
-978-2 9500000-9999999
-978-3 00-02 030-033 0340-0369 03700-03999 04-19 200-699 7000-8499 85000-89999
-978-3 900000-949999 9500000-9539999 95400-96999 9700000-9899999 99000-99499
-978-3 99500-99999
-978-4 00-19 200-699 7000-8499 85000-89999 900000-949999 9500000-9999999
-978-5 00-19 200-420 4210-4299 430-430 4310-4399 440-440 4410-4499 450-699
-978-5 7000-8499 85000-89999 900000-909999 91000-91999 9200-9299 93000-94999
-978-5 9500000-9500999 9501-9799 98000-98999 9900000-9909999 9910-9999
-978-600 00-09 100-499 5000-8999 90000-99999
-978-601 00-19 200-699 7000-7999 80000-84999 85-99
-978-602 00-19 200-799 8000-9499 95000-99999
-978-603 00-04 05-49 500-799 8000-8999 90000-99999
-978-604 0-4 50-89 900-979 9800-9999
-978-605 01-09 100-399 4000-5999 60000-89999 90-99
-978-606 0-0 10-49 500-799 8000-9199 92000-99999
-978-607 00-39 400-749 7500-9499 95000-99999
-978-608 0-0 10-19 200-449 4500-6499 65000-69999 7-9
-978-609 00-39 400-799 8000-9499 95000-99999
-978-612 00-29 300-399 4000-4499 45000-49999 50-99
-978-613 0-9
-978-614 00-39 400-799 8000-9499 95000-99999
-978-615 00-09 100-499 5000-7999 80000-89999
-978-616 00-19 200-699 7000-8999 90000-99999
-978-617 00-49 500-699 7000-8999 90000-99999
-978-7 00-09 100-499 5000-7999 80000-89999 900000-999999
-978-80 00-19 200-699 7000-8499 85000-89999 900000-999999
-978-81 00-19 200-699 7000-8499 85000-89999 900000-999999
-978-82 00-19 200-699 7000-8999 90000-98999 990000-999999
-978-83 00-19 200-599 60000-69999 7000-8499 85000-89999 900000-999999
-978-84 00-14 15000-19999 200-699 7000-8499 85000-89999 9000-9199
-978-84 920000-923999 92400-92999 930000-949999 95000-96999 9700-9999
-978-85 00-19 200-599 60000-69999 7000-8499 85000-89999 900000-979999
-978-85 98000-99999
-978-86 00-29 300-599 6000-7999 80000-89999 900000-999999
-978-87 00-29 400-649 7000-7999 85000-94999 970000-999999
-978-88 00-19 200-599 6000-8499 85000-89999 900000-949999 95000-99999
-978-89 00-24 250-549 5500-8499 85000-94999 950000-999999
-978-90 00-19 200-499 5000-6999 70000-79999 800000-849999 8500-8999 90-90
-978-90 910000-939999 94-94 950000-999999
-978-91 0-1 20-49 500-649 7000-7999 85000-94999 970000-999999
-978-92 0-5 60-79 800-899 9000-9499 95000-98999 990000-999999
-978-93 00-09 100-499 5000-7999 80000-94999 950000-999999
-978-94 000-599 6000-8999 90000-99999
-978-950 00-49 500-899 9000-9899 99000-99999
-978-951 0-1 20-54 550-889 8900-9499 95000-99999
-978-952 00-19 200-499 5000-5999 60-65 6600-6699 67000-69999 7000-7999 80-94
-978-952 9500-9899 99000-99999
-978-953 0-0 10-14 150-549 55000-59999 6000-9499 95000-99999
-978-954 00-28 2900-2999 300-799 8000-8999 90000-92999 9300-9999
-978-955 0000-1999 20-49 50000-54999 550-799 8000-9499 95000-99999
-978-956 00-19 200-699 7000-9999
-978-957 00-02 0300-0499 05-19 2000-2099 21-27 28000-30999 31-43 440-819
-978-957 8200-9699 97000-99999
-978-958 00-56 57000-59999 600-799 8000-9499 95000-99999
-978-959 00-19 200-699 7000-8499 85000-99999
-978-960 00-19 200-659 6600-6899 690-699 7000-8499 85000-92999 93-93 9400-9799
-978-960 98000-99999
-978-961 00-19 200-599 6000-8999 90000-94999
-978-962 00-19 200-699 7000-8499 85000-86999 8700-8999 900-999
-978-963 00-19 200-699 7000-8499 85000-89999 9000-9999
-978-964 00-14 150-249 2500-2999 300-549 5500-8999 90000-96999 970-989
-978-964 9900-9999
-978-965 00-19 200-599 7000-7999 90000-99999
-978-966 00-14 1500-1699 170-199 2000-2999 300-699 7000-8999 90000-99999
-978-967 00-00 0100-0999 10000-19999 300-499 5000-5999 60-89 900-989 9900-9989
-978-967 99900-99999
-978-968 01-39 400-499 5000-7999 800-899 9000-9999
-978-969 0-1 20-39 400-799 8000-9999
-978-970 01-59 600-899 9000-9099 91000-96999 9700-9999
-978-971 000-015 0160-0199 02-02 0300-0599 06-09 10-49 500-849 8500-9099
-978-971 91000-98999 9900-9999
-978-972 0-1 20-54 550-799 8000-9499 95000-99999
-978-973 0-0 100-169 1700-1999 20-54 550-759 7600-8499 85000-88999 8900-9499
-978-973 95000-99999
-978-974 00-19 200-699 7000-8499 85000-89999 90000-94999 9500-9999
-978-975 00000-00999 01-01 02-24 250-599 6000-9199 92000-98999 990-999
-978-976 0-3 40-59 600-799 8000-9499 95000-99999
-978-977 00-19 200-499 5000-6999 700-999
-978-978 000-199 2000-2999 30000-79999 8000-8999 900-999
-978-979 000-099 1000-1499 15000-19999 20-29 3000-3999 400-799 8000-9499
-978-979 95000-99999
-978-980 00-19 200-599 6000-9999
-978-981 00-11 1200-1999 200-289 2900-9999
-978-982 00-09 100-699 70-89 9000-9799 98000-99999
-978-983 00-01 020-199 2000-3999 40000-44999 45-49 50-79 800-899 9000-9899
-978-983 99000-99999
-978-984 00-39 400-799 8000-8999 90000-99999
-978-985 00-39 400-599 6000-8999 90000-99999
-978-986 00-11 120-559 5600-7999 80000-99999
-978-987 00-09 1000-1999 20000-29999 30-49 500-899 9000-9499 95000-99999
-978-988 00-16 17000-19999 200-799 8000-9699 97000-99999
-978-989 0-1 20-54 550-799 8000-9499 95000-99999
-978-9927 00-09 100-399 4000-4999
-978-9928 00-09 100-399 4000-4999
-978-9929 0-3 40-54 550-799 8000-9999
-978-9930 00-49 500-939 9400-9999
-978-9931 00-29 300-899 9000-9999
-978-9932 00-39 400-849 8500-9999
-978-9933 0-0 10-39 400-899 9000-9999
-978-9934 0-0 10-49 500-799 8000-9999
-978-9935 0-0 10-39 400-899 9000-9999
-978-9936 0-1 20-39 400-799 8000-9999
-978-9937 0-2 30-49 500-799 8000-9999
-978-9938 00-79 800-949 9500-9999
-978-9939 0-4 50-79 800-899 9000-9999
-978-9940 0-1 20-49 500-899 9000-9999
-978-9941 0-0 10-39 400-899 9000-9999
-978-9942 00-89 900-984 9850-9999
-978-9943 00-29 300-399 4000-9999
-978-9944 0000-0999 100-499 5000-5999 60-69 700-799 80-89 900-999
-978-9945 00-00 010-079 08-39 400-569 57-57 580-849 8500-9999
-978-9946 0-1 20-39 400-899 9000-9999
-978-9947 0-1 20-79 800-999
-978-9948 00-39 400-849 8500-9999
-978-9949 0-0 10-39 400-899 9000-9999
-978-9950 00-29 300-849 8500-9999
-978-9951 00-39 400-849 8500-9999
-978-9952 0-1 20-39 400-799 8000-9999
-978-9953 0-0 10-39 400-599 60-89 9000-9999
-978-9954 0-1 20-39 400-799 8000-9999
-978-9955 00-39 400-929 9300-9999
-978-9956 0-0 10-39 400-899 9000-9999
-978-9957 00-39 400-699 70-84 8500-8799 88-99
-978-9958 0-0 10-49 500-899 9000-9999
-978-9959 0-1 20-79 800-949 9500-9999
-978-9960 00-59 600-899 9000-9999
-978-9961 0-2 30-69 700-949 9500-9999
-978-9962 00-54 5500-5599 56-59 600-849 8500-9999
-978-9963 0-2 30-54 550-734 7350-7499 7500-9999
-978-9964 0-6 70-94 950-999
-978-9965 00-39 400-899 9000-9999
-978-9966 000-149 1500-1999 20-69 7000-7499 750-959 9600-9999
-978-9967 00-39 400-899 9000-9999
-978-9968 00-49 500-939 9400-9999
-978-9970 00-39 400-899 9000-9999
-978-9971 0-5 60-89 900-989 9900-9999
-978-9972 00-09 1-1 200-249 2500-2999 30-59 600-899 9000-9999
-978-9973 00-05 060-089 0900-0999 10-69 700-969 9700-9999
-978-9974 0-2 30-54 550-749 7500-9499 95-99
-978-9975 0-0 100-399 4000-4499 45-89 900-949 9500-9999
-978-9976 0-5 60-89 900-989 9900-9999
-978-9977 00-89 900-989 9900-9999
-978-9978 00-29 300-399 40-94 950-989 9900-9999
-978-9979 0-4 50-64 650-659 66-75 760-899 9000-9999
-978-9980 0-3 40-89 900-989 9900-9999
-978-9981 00-09 100-159 1600-1999 20-79 800-949 9500-9999
-978-9982 00-79 800-989 9900-9999
-978-9983 80-94 950-989 9900-9999
-978-9984 00-49 500-899 9000-9999
-978-9985 0-4 50-79 800-899 9000-9999
-978-9986 00-39 400-899 9000-9399 940-969 97-99
-978-9987 00-39 400-879 8800-9999
-978-9988 0-2 30-54 550-749 7500-9999
-978-9989 0-0 100-199 2000-2999 30-59 600-949 9500-9999
-978-99901 00-49 500-799 80-99
-978-99903 0-1 20-89 900-999
-978-99904 0-5 60-89 900-999
-978-99905 0-3 40-79 800-999
-978-99906 0-2 30-59 600-699 70-89 90-94 950-999
-978-99908 0-0 10-89 900-999
-978-99909 0-3 40-94 950-999
-978-99910 0-2 30-89 900-999
-978-99911 00-59 600-999
-978-99912 0-3 400-599 60-89 900-999
-978-99913 0-2 30-35 600-604
-978-99914 0-4 50-89 900-999
-978-99915 0-4 50-79 800-999
-978-99916 0-2 30-69 700-999
-978-99917 0-2 30-89 900-999
-978-99918 0-3 40-79 800-999
-978-99919 0-2 300-399 40-69 900-999
-978-99920 0-4 50-89 900-999
-978-99921 0-1 20-69 700-799 8-8 90-99
-978-99922 0-3 40-69 700-999
-978-99923 0-1 20-79 800-999
-978-99924 0-1 20-79 800-999
-978-99925 0-3 40-79 800-999
-978-99926 0-0 10-59 600-999
-978-99927 0-2 30-59 600-999
-978-99928 0-0 10-79 800-999
-978-99929 0-4 50-79 800-999
-978-99930 0-4 50-79 800-999
-978-99931 0-4 50-79 800-999
-978-99932 0-0 10-59 600-699 7-7 80-99
-978-99933 0-2 30-59 600-999
-978-99934 0-1 20-79 800-999
-978-99935 0-2 30-59 600-699 7-8 90-99
-978-99936 0-0 10-59 600-999
-978-99937 0-1 20-59 600-999
-978-99938 0-1 20-59 600-899 90-99
-978-99939 0-5 60-89 900-999
-978-99940 0-0 10-69 700-999
-978-99941 0-2 30-79 800-999
-978-99942 0-4 50-79 800-999
-978-99943 0-2 30-59 600-999
-978-99944 0-4 50-79 800-999
-978-99945 0-5 60-89 900-999
-978-99946 0-2 30-59 600-999
-978-99947 0-2 30-69 700-999
-978-99948 0-4 50-79 800-999
-978-99949 0-1 20-89 900-999
-978-99950 0-4 50-79 800-999
-978-99952 0-4 50-79 800-999
-978-99953 0-2 30-79 800-939 94-99
-978-99954 0-2 30-69 700-999
-978-99955 0-1 20-59 600-799 80-89 90-99
-978-99956 00-59 600-859 86-99
-978-99957 0-1 20-79 800-999
-978-99958 0-4 50-94 950-999
-978-99959 0-2 30-59 600-999
-978-99960 0-0 10-94 950-999
-978-99961 0-3 40-89 900-999
-978-99962 0-4 50-79 800-999
-978-99963 00-49 500-999
-978-99964 0-1 20-79 800-999
-978-99965 0-3 40-79 800-999
-978-99966 0-2 30-69 700-799
-978-99967 0-1 20-59 600-899
-979 10-10
-979-10 00-19 200-699 7000-8999 90000-97599 976000-999999
-"""
-
-def _expand():
- """Ensures that the prefix list is expanded as a dictionary to allow
- easy lookups. The default text form is compact but not very efficient."""
- global _prefixes
- if type(_prefixes) == dict:
- return
- # build a new dictionary of ranges from the string
- new_prefixes = dict()
- for line in _prefixes.splitlines():
- if line:
- ( prefix, r ) = line.split(' ', 1)
- range_list = new_prefixes.setdefault(prefix, [])
- for r in r.split(' '):
- low, high = r.split('-')
- range_list.append((len(low), low, high))
- # save the dictionary
- _prefixes = new_prefixes
-
-def lookup(prefix, number):
- """Look up the specified prefix and split the provided number split in
- the correct parts. If the prefix cannot be found or the number is not
- in any of the defined ranges a tuple with one element is returned.
- The prefix and number together are expected to form a complete ISBN13
- number.
-
- >>> lookup('978', '9024538270')
- ('90', '24538270')
- >>> lookup('978-0', '471117094')
- ('471', '117094')
- """
- _expand()
- try:
- for length, low, high in _prefixes[prefix]:
- if low <= number[:length] <= high:
- return number[:length], number[length:]
- except KeyError:
- pass
- return ( '', number )
-
-def load(fp):
- """Loads the data from the specified file descriptor. The provided file
- should match the format of the RangeMessage.xml file."""
- # this is in-line to avoid importing xml.sax for normal use
- import xml.sax
- # initialise data
- global _prefixes
- _prefixes = dict()
- # SAX handler class
- class RangeHandler(xml.sax.ContentHandler):
- def __init__(self):
- self._gather = None
- self._prefix = None
- self._range = None
- self._length = None
- def startElement(self, name, attrs):
- if name in ( 'MessageSerialNumber', 'MessageDate', 'Prefix',
- 'Range', 'Length', ):
- self._gather = ''
- def characters(self, content):
- if self._gather is not None:
- self._gather += content
- def endElement(self, name):
- if name == 'MessageSerialNumber':
- global _download_serial
- _download_serial = self._gather.strip()
- elif name == 'MessageDate':
- global _download_date
- _download_date = self._gather.strip()
- elif name == 'Prefix':
- self._prefix = self._gather.strip()
- elif name == 'Range':
- self._range = self._gather.strip()
- elif name == 'Length':
- self._length = int(self._gather.strip())
- elif name == 'Rule' and self._length:
- r = ( self._length, ) + tuple( x[:self._length] for x in
self._range.split('-') )
- _prefixes.setdefault(self._prefix, []).append(r)
- self._gather = None
- # start the actual parsing
- parser = xml.sax.make_parser()
- parser.setContentHandler(RangeHandler())
- parser.parse(fp)
-def download(url=None):
- """Download the RangeMessage.xml data from the International ISBN Agency
- website or from the specified URL."""
- import urllib
- load(urllib.urlopen(url or _download_url))
+# The place where the current version of RangeMessage.xml can be downloaded.
+download_url = 'http://www.isbn-international.org/agency?rmxml=1'
-def _wrap(text, max_len, sep=' '):
+def _wrap(text):
"""Generator that returns lines of text that are no longer than
- max_len. The sep arguments is the string to split on."""
+ max_len."""
while text:
i = len(text)
- if i > max_len:
- i = text.rindex(' ', 20, max_len)
+ if i > 73:
+ i = text.rindex(',', 20, 73)
yield text[:i]
text = text[i+1:]
-def output(fp=None):
- """Print the downloaded range data to stdout (or a file if one is
- provided) in the compact text format suitable for inclusion in this
- module."""
- _expand()
- if not fp:
- import sys
- fp = sys.stdout
- # first print the header if we can
- try:
- fp.write('# generated from RangeMessage.xml, downloaded from\n'
- '# %(url)s\n'
- '# serial %(serial)s\n'
- '# date %(date)s\n'
- '_prefixes = """\n' % { 'url': _download_url,
- 'serial': _download_serial,
- 'date': _download_date })
- headerprinted = True
- except NameError:
- headerprinted = False
- # print the actual prefixes
- prefixes = _prefixes.items()
- prefixes.sort()
- for prefix, ranges in prefixes:
- for line in _wrap(' '.join(r[1] + '-' + r[2] for r in ranges), 77 -
len(prefix)):
- fp.write('%s %s\n' % ( prefix, line ) )
- # print the footer if the header was printed
- if headerprinted:
- fp.write('"""\n')
+
+class RangeHandler(xml.sax.ContentHandler):
+
+ def __init__(self):
+ self._gather = None
+ self._prefix = None
+ self._agency = None
+ self._range = None
+ self._length = None
+ self._ranges = []
+ self._last = None
+ self._topranges = {}
+
+ def startElement(self, name, attrs):
+ if name in ( 'MessageSerialNumber', 'MessageDate', 'Prefix',
+ 'Agency', 'Range', 'Length', ):
+ self._gather = ''
+
+ def characters(self, content):
+ if self._gather is not None:
+ self._gather += content
+
+ def endElement(self, name):
+ if name == 'MessageSerialNumber':
+ print '# file serial %s' % self._gather.strip()
+ elif name == 'MessageDate':
+ print '# file date %s' % self._gather.strip()
+ elif name == 'Prefix':
+ self._prefix = self._gather.strip()
+ elif name == 'Agency':
+ self._agency = self._gather.strip()
+ elif name == 'Range':
+ self._range = self._gather.strip()
+ elif name == 'Length':
+ self._length = int(self._gather.strip())
+ elif name == 'Rule' and self._length:
+ self._ranges.append(tuple( x[:self._length] for x in
self._range.split('-') ))
+ elif name == 'Rules':
+ if '-' in self._prefix:
+ p, a = self._prefix.split('-')
+ if p != self._last:
+ print p
+ self._last = p
+ for line in _wrap(','.join(r[0] + '-' + r[1] for r in
self._topranges[p])):
+ print ' %s' % ( line )
+ print ' %s agency="%s"' % ( a, self._agency )
+ for line in _wrap(','.join(r[0] + '-' + r[1] for r in
self._ranges)):
+ print ' %s' % ( line )
+ else:
+ self._topranges[self._prefix] = self._ranges
+ self._ranges = []
+ self._gather = None
+
+
+if __name__ == '__main__':
+ print '# generated from RangeMessage.xml, downloaded from'
+ print '# %s' % download_url
+ parser = xml.sax.make_parser()
+ parser.setContentHandler(RangeHandler())
+ parser.parse(urllib.urlopen(download_url))
+ #parser.parse('RangeMessage.xml')
Added: python-stdnum/stdnum/isbn.dat
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ python-stdnum/stdnum/isbn.dat Wed Nov 24 23:09:28 2010 (r42)
@@ -0,0 +1,436 @@
+# generated from RangeMessage.xml, downloaded from
+# http://www.isbn-international.org/agency?rmxml=1
+# file serial 7a5c26a5-62dc-463d-adab-2e214f2a1316
+# file date Tue, 16 Nov 2010 10:49:43 GMT
+978
+ 0-5,600-649,7-7,80-94,950-989,9900-9989,99900-99999
+ 0 agency="English language"
+ 00-19,200-699,7000-8499,85000-89999,900000-949999,9500000-9999999
+ 1 agency="English language"
+ 00-09,100-399,4000-5499,55000-86979,869800-998999,9990000-9999999
+ 2 agency="French language"
+ 00-19,200-349,35000-39999,400-699,7000-8399,84000-89999,900000-949999
+ 9500000-9999999
+ 3 agency="German language"
+ 00-02,030-033,0340-0369,03700-03999,04-19,200-699,7000-8499,85000-89999
+ 900000-949999,9500000-9539999,95400-96999,9700000-9899999,99000-99499
+ 99500-99999
+ 4 agency="Japan"
+ 00-19,200-699,7000-8499,85000-89999,900000-949999,9500000-9999999
+ 5 agency="Russian Federation and former USSR"
+ 00-19,200-420,4210-4299,430-430,4310-4399,440-440,4410-4499,450-699
+ 7000-8499,85000-89999,900000-909999,91000-91999,9200-9299,93000-94999
+ 9500000-9500999,9501-9799,98000-98999,9900000-9909999,9910-9999
+ 600 agency="Iran"
+ 00-09,100-499,5000-8999,90000-99999
+ 601 agency="Kazakhstan"
+ 00-19,200-699,7000-7999,80000-84999,85-99
+ 602 agency="Indonesia"
+ 00-19,200-799,8000-9499,95000-99999
+ 603 agency="Saudi Arabia"
+ 00-04,05-49,500-799,8000-8999,90000-99999
+ 604 agency="Vietnam"
+ 0-4,50-89,900-979,9800-9999
+ 605 agency="Turkey"
+ 01-09,100-399,4000-5999,60000-89999,90-99
+ 606 agency="Romania"
+ 0-0,10-49,500-799,8000-9199,92000-99999
+ 607 agency="Mexico"
+ 00-39,400-749,7500-9499,95000-99999
+ 608 agency="Macedonia"
+ 0-0,10-19,200-449,4500-6499,65000-69999,7-9
+ 609 agency="Lithuania"
+ 00-39,400-799,8000-9499,95000-99999
+ 611 agency="Thailand"
+ 612 agency="Peru"
+ 00-29,300-399,4000-4499,45000-49999,50-99
+ 613 agency="Mauritius"
+ 0-9
+ 614 agency="Lebanon"
+ 00-39,400-799,8000-9499,95000-99999
+ 615 agency="Hungary"
+ 00-09,100-499,5000-7999,80000-89999
+ 616 agency="Thailand"
+ 00-19,200-699,7000-8999,90000-99999
+ 617 agency="Ukraine"
+ 00-49,500-699,7000-8999,90000-99999
+ 7 agency="China, People's Republic"
+ 00-09,100-499,5000-7999,80000-89999,900000-999999
+ 80 agency="Czech Republic and Slovakia"
+ 00-19,200-699,7000-8499,85000-89999,900000-999999
+ 81 agency="India"
+ 00-19,200-699,7000-8499,85000-89999,900000-999999
+ 82 agency="Norway"
+ 00-19,200-699,7000-8999,90000-98999,990000-999999
+ 83 agency="Poland"
+ 00-19,200-599,60000-69999,7000-8499,85000-89999,900000-999999
+ 84 agency="Spain"
+ 00-14,15000-19999,200-699,7000-8499,85000-89999,9000-9199,920000-923999
+ 92400-92999,930000-949999,95000-96999,9700-9999
+ 85 agency="Brazil"
+ 00-19,200-599,60000-69999,7000-8499,85000-89999,900000-979999,98000-99999
+ 86 agency="Serbia and Montenegro"
+ 00-29,300-599,6000-7999,80000-89999,900000-999999
+ 87 agency="Denmark"
+ 00-29,400-649,7000-7999,85000-94999,970000-999999
+ 88 agency="Italy"
+ 00-19,200-599,6000-8499,85000-89999,900000-949999,95000-99999
+ 89 agency="Korea, Republic"
+ 00-24,250-549,5500-8499,85000-94999,950000-969999,97000-98999,990-999
+ 90 agency="Netherlands"
+ 00-19,200-499,5000-6999,70000-79999,800000-849999,8500-8999,90-90
+ 910000-939999,94-94,950000-999999
+ 91 agency="Sweden"
+ 0-1,20-49,500-649,7000-7999,85000-94999,970000-999999
+ 92 agency="International NGO Publishers and EC Organizations"
+ 0-5,60-79,800-899,9000-9499,95000-98999,990000-999999
+ 93 agency="India"
+ 00-09,100-499,5000-7999,80000-94999,950000-999999
+ 94 agency="Netherlands"
+ 000-599,6000-8999,90000-99999
+ 950 agency="Argentina"
+ 00-49,500-899,9000-9899,99000-99999
+ 951 agency="Finland"
+ 0-1,20-54,550-889,8900-9499,95000-99999
+ 952 agency="Finland"
+ 00-19,200-499,5000-5999,60-65,6600-6699,67000-69999,7000-7999,80-94
+ 9500-9899,99000-99999
+ 953 agency="Croatia"
+ 0-0,10-14,150-549,55000-59999,6000-9499,95000-99999
+ 954 agency="Bulgaria"
+ 00-28,2900-2999,300-799,8000-8999,90000-92999,9300-9999
+ 955 agency="Sri Lanka"
+ 0000-1999,20-49,50000-54999,550-799,8000-9499,95000-99999
+ 956 agency="Chile"
+ 00-19,200-699,7000-9999
+ 957 agency="Taiwan"
+ 00-02,0300-0499,05-19,2000-2099,21-27,28000-30999,31-43,440-819
+ 8200-9699,97000-99999
+ 958 agency="Colombia"
+ 00-56,57000-59999,600-799,8000-9499,95000-99999
+ 959 agency="Cuba"
+ 00-19,200-699,7000-8499,85000-99999
+ 960 agency="Greece"
+ 00-19,200-659,6600-6899,690-699,7000-8499,85000-92999,93-93,9400-9799
+ 98000-99999
+ 961 agency="Slovenia"
+ 00-19,200-599,6000-8999,90000-94999
+ 962 agency="Hong Kong, China"
+ 00-19,200-699,7000-8499,85000-86999,8700-8999,900-999
+ 963 agency="Hungary"
+ 00-19,200-699,7000-8499,85000-89999,9000-9999
+ 964 agency="Iran"
+ 00-14,150-249,2500-2999,300-549,5500-8999,90000-96999,970-989,9900-9999
+ 965 agency="Israel"
+ 00-19,200-599,7000-7999,90000-99999
+ 966 agency="Ukraine"
+ 00-14,1500-1699,170-199,2000-2999,300-699,7000-8999,90000-99999
+ 967 agency="Malaysia"
+ 00-00,0100-0999,10000-19999,300-499,5000-5999,60-89,900-989,9900-9989
+ 99900-99999
+ 968 agency="Mexico"
+ 01-39,400-499,5000-7999,800-899,9000-9999
+ 969 agency="Pakistan"
+ 0-1,20-39,400-799,8000-9999
+ 970 agency="Mexico"
+ 01-59,600-899,9000-9099,91000-96999,9700-9999
+ 971 agency="Philippines"
+ 000-015,0160-0199,02-02,0300-0599,06-09,10-49,500-849,8500-9099
+ 91000-98999,9900-9999
+ 972 agency="Portugal"
+ 0-1,20-54,550-799,8000-9499,95000-99999
+ 973 agency="Romania"
+ 0-0,100-169,1700-1999,20-54,550-759,7600-8499,85000-88999,8900-9499
+ 95000-99999
+ 974 agency="Thailand"
+ 00-19,200-699,7000-8499,85000-89999,90000-94999,9500-9999
+ 975 agency="Turkey"
+ 00000-01999,02-24,250-599,6000-9199,92000-98999,990-999
+ 976 agency="Caribbean Community"
+ 0-3,40-59,600-799,8000-9499,95000-99999
+ 977 agency="Egypt"
+ 00-19,200-499,5000-6999,700-999
+ 978 agency="Nigeria"
+ 000-199,2000-2999,30000-79999,8000-8999,900-999
+ 979 agency="Indonesia"
+ 000-099,1000-1499,15000-19999,20-29,3000-3999,400-799,8000-9499
+ 95000-99999
+ 980 agency="Venezuela"
+ 00-19,200-599,6000-9999
+ 981 agency="Singapore"
+ 00-11,1200-1999,200-289,2900-9999
+ 982 agency="South Pacific"
+ 00-09,100-699,70-89,9000-9799,98000-99999
+ 983 agency="Malaysia"
+ 00-01,020-199,2000-3999,40000-44999,45-49,50-79,800-899,9000-9899
+ 99000-99999
+ 984 agency="Bangladesh"
+ 00-39,400-799,8000-8999,90000-99999
+ 985 agency="Belarus"
+ 00-39,400-599,6000-8999,90000-99999
+ 986 agency="Taiwan"
+ 00-11,120-559,5600-7999,80000-99999
+ 987 agency="Argentina"
+ 00-09,1000-1999,20000-29999,30-49,500-899,9000-9499,95000-99999
+ 988 agency="Hong Kong, China"
+ 00-16,17000-19999,200-799,8000-9699,97000-99999
+ 989 agency="Portugal"
+ 0-1,20-54,550-799,8000-9499,95000-99999
+ 9927 agency="Qatar"
+ 00-09,100-399,4000-4999
+ 9928 agency="Albania"
+ 00-09,100-399,4000-4999
+ 9929 agency="Guatemala"
+ 0-3,40-54,550-799,8000-9999
+ 9930 agency="Costa Rica"
+ 00-49,500-939,9400-9999
+ 9931 agency="Algeria"
+ 00-29,300-899,9000-9999
+ 9932 agency="Lao People's Democratic Republic"
+ 00-39,400-849,8500-9999
+ 9933 agency="Syria"
+ 0-0,10-39,400-899,9000-9999
+ 9934 agency="Latvia"
+ 0-0,10-49,500-799,8000-9999
+ 9935 agency="Iceland"
+ 0-0,10-39,400-899,9000-9999
+ 9936 agency="Afghanistan"
+ 0-1,20-39,400-799,8000-9999
+ 9937 agency="Nepal"
+ 0-2,30-49,500-799,8000-9999
+ 9938 agency="Tunisia"
+ 00-79,800-949,9500-9999
+ 9939 agency="Armenia"
+ 0-4,50-79,800-899,9000-9999
+ 9940 agency="Montenegro"
+ 0-1,20-49,500-899,9000-9999
+ 9941 agency="Georgia"
+ 0-0,10-39,400-899,9000-9999
+ 9942 agency="Ecuador"
+ 00-89,900-984,9850-9999
+ 9943 agency="Uzbekistan"
+ 00-29,300-399,4000-9999
+ 9944 agency="Turkey"
+ 0000-0999,100-499,5000-5999,60-69,700-799,80-89,900-999
+ 9945 agency="Dominican Republic"
+ 00-00,010-079,08-39,400-569,57-57,580-849,8500-9999
+ 9946 agency="Korea, P.D.R."
+ 0-1,20-39,400-899,9000-9999
+ 9947 agency="Algeria"
+ 0-1,20-79,800-999
+ 9948 agency="United Arab Emirates"
+ 00-39,400-849,8500-9999
+ 9949 agency="Estonia"
+ 0-0,10-39,400-899,9000-9999
+ 9950 agency="Palestine"
+ 00-29,300-849,8500-9999
+ 9951 agency="Kosova"
+ 00-39,400-849,8500-9999
+ 9952 agency="Azerbaijan"
+ 0-1,20-39,400-799,8000-9999
+ 9953 agency="Lebanon"
+ 0-0,10-39,400-599,60-89,9000-9999
+ 9954 agency="Morocco"
+ 0-1,20-39,400-799,8000-9999
+ 9955 agency="Lithuania"
+ 00-39,400-929,9300-9999
+ 9956 agency="Cameroon"
+ 0-0,10-39,400-899,9000-9999
+ 9957 agency="Jordan"
+ 00-39,400-699,70-84,8500-8799,88-99
+ 9958 agency="Bosnia and Herzegovina"
+ 0-0,10-18,1900-1999,20-49,500-899,9000-9999
+ 9959 agency="Libya"
+ 0-1,20-79,800-949,9500-9999
+ 9960 agency="Saudi Arabia"
+ 00-59,600-899,9000-9999
+ 9961 agency="Algeria"
+ 0-2,30-69,700-949,9500-9999
+ 9962 agency="Panama"
+ 00-54,5500-5599,56-59,600-849,8500-9999
+ 9963 agency="Cyprus"
+ 0-2,30-54,550-734,7350-7499,7500-9999
+ 9964 agency="Ghana"
+ 0-6,70-94,950-999
+ 9965 agency="Kazakhstan"
+ 00-39,400-899,9000-9999
+ 9966 agency="Kenya"
+ 000-149,1500-1999,20-69,7000-7499,750-959,9600-9999
+ 9967 agency="Kyrgyz Republic"
+ 00-39,400-899,9000-9999
+ 9968 agency="Costa Rica"
+ 00-49,500-939,9400-9999
+ 9970 agency="Uganda"
+ 00-39,400-899,9000-9999
+ 9971 agency="Singapore"
+ 0-5,60-89,900-989,9900-9999
+ 9972 agency="Peru"
+ 00-09,1-1,200-249,2500-2999,30-59,600-899,9000-9999
+ 9973 agency="Tunisia"
+ 00-05,060-089,0900-0999,10-69,700-969,9700-9999
+ 9974 agency="Uruguay"
+ 0-2,30-54,550-749,7500-9499,95-99
+ 9975 agency="Moldova"
+ 0-0,100-399,4000-4499,45-89,900-949,9500-9999
+ 9976 agency="Tanzania"
+ 0-5,60-89,900-989,9900-9999
+ 9977 agency="Costa Rica"
+ 00-89,900-989,9900-9999
+ 9978 agency="Ecuador"
+ 00-29,300-399,40-94,950-989,9900-9999
+ 9979 agency="Iceland"
+ 0-4,50-64,650-659,66-75,760-899,9000-9999
+ 9980 agency="Papua New Guinea"
+ 0-3,40-89,900-989,9900-9999
+ 9981 agency="Morocco"
+ 00-09,100-159,1600-1999,20-79,800-949,9500-9999
+ 9982 agency="Zambia"
+ 00-79,800-989,9900-9999
+ 9983 agency="Gambia"
+ 80-94,950-989,9900-9999
+ 9984 agency="Latvia"
+ 00-49,500-899,9000-9999
+ 9985 agency="Estonia"
+ 0-4,50-79,800-899,9000-9999
+ 9986 agency="Lithuania"
+ 00-39,400-899,9000-9399,940-969,97-99
+ 9987 agency="Tanzania"
+ 00-39,400-879,8800-9999
+ 9988 agency="Ghana"
+ 0-2,30-54,550-749,7500-9999
+ 9989 agency="Macedonia"
+ 0-0,100-199,2000-2999,30-59,600-949,9500-9999
+ 99901 agency="Bahrain"
+ 00-49,500-799,80-99
+ 99902 agency="Gabon"
+ 99903 agency="Mauritius"
+ 0-1,20-89,900-999
+ 99904 agency="Netherlands Antilles and Aruba"
+ 0-5,60-89,900-999
+ 99905 agency="Bolivia"
+ 0-3,40-79,800-999
+ 99906 agency="Kuwait"
+ 0-2,30-59,600-699,70-89,90-94,950-999
+ 99908 agency="Malawi"
+ 0-0,10-89,900-999
+ 99909 agency="Malta"
+ 0-3,40-94,950-999
+ 99910 agency="Sierra Leone"
+ 0-2,30-89,900-999
+ 99911 agency="Lesotho"
+ 00-59,600-999
+ 99912 agency="Botswana"
+ 0-3,400-599,60-89,900-999
+ 99913 agency="Andorra"
+ 0-2,30-35,600-604
+ 99914 agency="Suriname"
+ 0-4,50-89,900-999
+ 99915 agency="Maldives"
+ 0-4,50-79,800-999
+ 99916 agency="Namibia"
+ 0-2,30-69,700-999
+ 99917 agency="Brunei Darussalam"
+ 0-2,30-89,900-999
+ 99918 agency="Faroe Islands"
+ 0-3,40-79,800-999
+ 99919 agency="Benin"
+ 0-2,300-399,40-69,900-999
+ 99920 agency="Andorra"
+ 0-4,50-89,900-999
+ 99921 agency="Qatar"
+ 0-1,20-69,700-799,8-8,90-99
+ 99922 agency="Guatemala"
+ 0-3,40-69,700-999
+ 99923 agency="El Salvador"
+ 0-1,20-79,800-999
+ 99924 agency="Nicaragua"
+ 0-1,20-79,800-999
+ 99925 agency="Paraguay"
+ 0-3,40-79,800-999
+ 99926 agency="Honduras"
+ 0-0,10-59,600-999
+ 99927 agency="Albania"
+ 0-2,30-59,600-999
+ 99928 agency="Georgia"
+ 0-0,10-79,800-999
+ 99929 agency="Mongolia"
+ 0-4,50-79,800-999
+ 99930 agency="Armenia"
+ 0-4,50-79,800-999
+ 99931 agency="Seychelles"
+ 0-4,50-79,800-999
+ 99932 agency="Malta"
+ 0-0,10-59,600-699,7-7,80-99
+ 99933 agency="Nepal"
+ 0-2,30-59,600-999
+ 99934 agency="Dominican Republic"
+ 0-1,20-79,800-999
+ 99935 agency="Haiti"
+ 0-2,30-59,600-699,7-8,90-99
+ 99936 agency="Bhutan"
+ 0-0,10-59,600-999
+ 99937 agency="Macau"
+ 0-1,20-59,600-999
+ 99938 agency="Srpska, Republic of"
+ 0-1,20-59,600-899,90-99
+ 99939 agency="Guatemala"
+ 0-5,60-89,900-999
+ 99940 agency="Georgia"
+ 0-0,10-69,700-999
+ 99941 agency="Armenia"
+ 0-2,30-79,800-999
+ 99942 agency="Sudan"
+ 0-4,50-79,800-999
+ 99943 agency="Albania"
+ 0-2,30-59,600-999
+ 99944 agency="Ethiopia"
+ 0-4,50-79,800-999
+ 99945 agency="Namibia"
+ 0-5,60-89,900-999
+ 99946 agency="Nepal"
+ 0-2,30-59,600-999
+ 99947 agency="Tajikistan"
+ 0-2,30-69,700-999
+ 99948 agency="Eritrea"
+ 0-4,50-79,800-999
+ 99949 agency="Mauritius"
+ 0-1,20-89,900-999
+ 99950 agency="Cambodia"
+ 0-4,50-79,800-999
+ 99951 agency="Congo"
+ 99952 agency="Mali"
+ 0-4,50-79,800-999
+ 99953 agency="Paraguay"
+ 0-2,30-79,800-939,94-99
+ 99954 agency="Bolivia"
+ 0-2,30-69,700-999
+ 99955 agency="Srpska, Republic of"
+ 0-1,20-59,600-799,80-89,90-99
+ 99956 agency="Albania"
+ 00-59,600-859,86-99
+ 99957 agency="Malta"
+ 0-1,20-79,800-999
+ 99958 agency="Bahrain"
+ 0-4,50-94,950-999
+ 99959 agency="Luxembourg"
+ 0-2,30-59,600-999
+ 99960 agency="Malawi"
+ 0-0,10-94,950-999
+ 99961 agency="El Salvador"
+ 0-3,40-89,900-999
+ 99962 agency="Mongolia"
+ 0-4,50-79,800-999
+ 99963 agency="Cambodia"
+ 00-49,500-999
+ 99964 agency="Nicaragua"
+ 0-1,20-79,800-999
+ 99965 agency="Macau"
+ 0-3,40-79,800-999
+ 99966 agency="Kuwait"
+ 0-2,30-69,700-799
+ 99967 agency="Paraguay"
+ 0-1,20-59,600-899
+979
+ 10-10
+ 10 agency="France"
+ 00-19,200-699,7000-8999,90000-97599,976000-999999
Copied and modified: python-stdnum/stdnum/isbn.py (from r41,
python-stdnum/stdnum/isbn/__init__.py)
==============================================================================
--- python-stdnum/stdnum/isbn/__init__.py Sat Sep 11 11:13:47 2010
(r41, copy source)
+++ python-stdnum/stdnum/isbn.py Wed Nov 24 23:09:28 2010 (r42)
@@ -108,7 +108,7 @@
"""Split the specified ISBN into an EAN.UCC prefix, a group prefix, a
registrant, an item number and a check-digit. If the number is in ISBN10
format the returned EAN.UCC prefix is '978'."""
- import ranges
+ from stdnum import numdb
# clean up number
number = compact(number)
# get Bookland prefix if any
@@ -118,12 +118,13 @@
else:
oprefix = prefix = number[:3]
number = number[3:]
- # get group
- group, number = ranges.lookup(prefix, number)
- publisher, number = ranges.lookup('%s-%s' % (prefix, group), number)
- itemnr = number[:-1]
- check = number[-1]
- return ( oprefix, group, publisher, itemnr, check )
+ # split the number
+ result = numdb.get('isbn').split(prefix+number[:-1])[1:]
+ itemnr = result.pop()
+ group = result.pop(0) if result else ''
+ publisher = result.pop(0) if result else ''
+ # return results
+ return ( oprefix, group, publisher, itemnr, number[-1] )
def format(number, separator='-'):
"""Reformat the passed number to the standard format with the EAN.UCC
Added: python-stdnum/stdnum/numdb.py
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ python-stdnum/stdnum/numdb.py Wed Nov 24 23:09:28 2010 (r42)
@@ -0,0 +1,160 @@
+
+# numdb.py - module for handling hierarchically organised numbers
+#
+# Copyright (C) 2010 Arthur de Jong
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+# 02110-1301 USA
+
+"""This module contains functions for reading and querying a database for
+storing numbers that use a hierarchical format (e.g. ISBN, IBAN, phone
+numbers, etc).
+
+To read a database from a file:
+
+>>> dbfile = read(open('test.dat', 'r'))
+
+To split a number:
+
+>>> dbfile.split('01006')
+['0', '100', '6']
+>>> dbfile.split('902006')
+['90', '20', '06']
+>>> dbfile.split('909856')
+['90', '985', '6']
+
+To split the number and get properties for each part:
+
+>>> dbfile.info('01006')
+[('0', {'prop1': 'foo'}), ('100', {'prop2': 'bar'}), ('6', {})]
+>>> dbfile.info('02006')
+[('0', {'prop1': 'foo'}), ('200', {'prop2': 'bar', 'prop3': 'baz'}), ('6', {})]
+>>> dbfile.info('03456')
+[('0', {'prop1': 'foo'}), ('345', {'prop2': 'bar', 'prop3': 'baz'}), ('6', {})]
+>>> dbfile.info('902006')
+[('90', {'prop1': 'booz'}), ('20', {'prop2': 'foo'}), ('06', {})]
+>>> dbfile.info('909856')
+[('90', {'prop1': 'booz'}), ('985', {'prop2': 'fooz'}), ('6', {})]
+>>> dbfile.info('9889')
+[('98', {'prop1': 'booz'}), ('89', {'prop2': 'foo'})]
+"""
+
+import re
+from pkg_resources import resource_stream
+
+_line_re = re.compile('^(?P<indent>
*)(?P<ranges>([0-9a-zA-Z]+(-[0-9a-zA-Z]+)?)(,[0-9a-zA-Z]+(-[0-9a-zA-Z]+)?)*)
*(?P<props>.*)$')
+_prop_re = re.compile('(?P<prop>[0-9a-zA-Z-_]+)="(?P<value>[^"]*)"')
+
+# this is a cache of open databases
+_open_databases = {}
+
+# the prefixes attribute of NumDB is structured as follows:
+# prefixes = [
+# [ length, low, high, props, children ]
+# ...
+# ]
+# where children is a prefixes structure in it's own right
+# (there is no expected ordering within the list)
+
+
+class NumDB(object):
+
+ def __init__(self):
+ self.prefixes = []
+
+ @staticmethod
+ def _merge(results):
+ """Merge the provided list of possible results into a single result
+ list (this is a generator)."""
+ results.append([])
+ for parts in map(None, *results):
+ # regroup parts into parts list and properties list
+ partlist, proplist = zip(*(x for x in parts if x))
+ part = min(partlist, key=len)
+ props = {}
+ for p in proplist:
+ props.update(p)
+ yield part, props
+
+ @staticmethod
+ def _find(number, prefixes):
+ """Lookup the specified number in the list of prefixes, this will
+ return basically what info() should return but works recursively."""
+ if not number:
+ return []
+ results = []
+ if prefixes:
+ for length, low, high, props, children in prefixes:
+ if low <= number[:length] <= high:
+ results.append([ (number[:length], props) ] +
+ NumDB._find(number[length:], children))
+ # not-found fallback
+ if not results:
+ return [ ( number, {} ) ]
+ # merge the results into a single result
+ return list(NumDB._merge(results))
+
+ def info(self, number):
+ """Split the provided number in components and associate properties
+ with each component. This returns a tuple of tuples. Each tuple
+ consists of a string (a part of the number) and a dict of properties.
+ """
+ return NumDB._find(number, self.prefixes)
+
+ def split(self, number):
+ """Split the provided number in components. This returns a tuple with
+ the number of components identified."""
+ return [part for part, props in self.info(number)]
+
+
+def _parse(fp):
+ """Read lines of text from the file pointer and generate indent, length,
+ low, high, properties tuples."""
+ for line in fp.xreadlines():
+ # ignore comments
+ if line[0] == '#' or line.strip() == '':
+ continue
+ # any other line should parse
+ match = _line_re.search(line)
+ indent = len(match.group('indent'))
+ ranges = match.group('ranges')
+ props = dict(_prop_re.findall(match.group('props')))
+ for rnge in ranges.split(','):
+ if '-' in rnge:
+ low, high = rnge.split('-')
+ else:
+ low, high = rnge, rnge
+ yield ( indent, len(low), low, high, props )
+
+def read(fp):
+ """Return a new database with the data read from the specified file."""
+ last_indent = 0
+ db = NumDB()
+ stack = { 0: db.prefixes }
+ for indent, length, low, high, props in _parse(fp):
+ if indent > last_indent:
+ # populate the children field of the last indent
+ if stack[last_indent][-1][4] is None:
+ stack[last_indent][-1][4] = []
+ stack[indent] = stack[last_indent][-1][4]
+ stack[indent].append([length, low, high, props, None])
+ last_indent = indent
+ return db
+
+def get(name):
+ """Opens a database with the specified name to perform queries on."""
+ if name not in _open_databases:
+ _open_databases[name] = read(resource_stream(__name__, name + '.dat'))
+ return _open_databases[name]
Added: python-stdnum/test.dat
==============================================================================
--- /dev/null 00:00:00 1970 (empty, because file is newly added)
+++ python-stdnum/test.dat Wed Nov 24 23:09:28 2010 (r42)
@@ -0,0 +1,7 @@
+# this is a comment line
+0-8 prop1="foo"
+ 100-999 prop2="bar"
+ 200,300-399 prop3="baz"
+90-99 prop1="booz"
+ 00-89 prop2="foo"
+ 900-999 prop2="fooz"
Modified: python-stdnum/tests/test_isbn.doctest
==============================================================================
--- python-stdnum/tests/test_isbn.doctest Sat Sep 11 11:13:47 2010
(r41)
+++ python-stdnum/tests/test_isbn.doctest Wed Nov 24 23:09:28 2010
(r42)
@@ -77,82 +77,3 @@
('', '99996', '', '7827', '0')
>>> isbn.split('979-20-1234567-8')
('979', '', '', '201234567', '8')
-
-
-Some tests for the ranges module. This is more an internal module so
-tests here are not very critical.
-
->>> from stdnum.isbn import ranges
->>> list(ranges._wrap(2 * 'abc def ghijklmn opqr stuvwx yz', 40))[0]
-'abc def ghijklmn opqr stuvwx yzabc def'
-
-
-Test output function. Bit of a limited test but we see if the serialised
-form of the prefix/ranges list contains at least the same prefixes as the
-current _prefixes list.
-
->>> import StringIO
->>> output = StringIO.StringIO()
->>> ranges.output(output)
->>> k = set( x.split(' ')[0] for x in
StringIO.StringIO(output.getvalue()).readlines() )
->>> k == set(ranges._prefixes.keys())
-True
-
-
-Make an XML file with somre prefix definitions and load that into the
-ranges module.
-
-First save the current ranges so we can restore later.
-
->>> save_prefixes = ranges._prefixes
-
-Write the XML to a file.
-
->>> import tempfile
->>> xmlfile = tempfile.NamedTemporaryFile(delete=False)
->>> xmlfile.write("""<?xml version='1.0' encoding='utf-8'?>
-... <ISBNRangeMessage>
-... <MessageSerialNumber>0aad2b046ddd9b30e080cb2b24afc868</MessageSerialNumber>
-... <MessageDate>Thu, 20 May 2010 18:36:55 GMT</MessageDate>
-... <EAN.UCCPrefixes><EAN.UCC>
-... <Prefix>978</Prefix>
-... <Rules>
-... <Rule><Range>0000000-5999999</Range><Length>1</Length></Rule>
-... <Rule><Range>6000000-6499999</Range><Length>3</Length></Rule>
-... <Rule><Range>6500000-6999999</Range><Length>0</Length></Rule>
-... </Rules>
-... </EAN.UCC></EAN.UCCPrefixes>
-... <RegistrationGroups>
-... <Group>
-... <Prefix>978-0</Prefix>
-... <Rules>
-... <Rule><Range>0000000-1999999</Range><Length>2</Length></Rule>
-... <Rule><Range>2000000-6999999</Range><Length>3</Length></Rule>
-... </Rules>
-... </Group>
-... </RegistrationGroups>
-... </ISBNRangeMessage>
-... """)
->>> xmlfile.close()
-
-Load the XML file by URL and output it to another string. Check if the
-content of the XML has been
-
->>> import urllib
->>> ranges.download('file://' + urllib.pathname2url(xmlfile.name))
->>> import sys
->>> ranges.output()
-# generated from RangeMessage.xml, downloaded from
-# http://www.isbn-international.org/agency?rmxml=1
-# serial 0aad2b046ddd9b30e080cb2b24afc868
-# date Thu, 20 May 2010 18:36:55 GMT
-_prefixes = """
-978 0-5 600-649
-978-0 00-19 200-699
-"""
-
-Restore the original ranges and clean up.
-
->>> ranges._prefixes = save_prefixes
->>> import os
->>> os.unlink(xmlfile.name)
--
To unsubscribe send an email to
python-stdnum-commits-unsubscribe@lists.arthurdejong.org or see
http://lists.arthurdejong.org/python-stdnum-commits
- python-stdnum commit: r42 - in python-stdnum: . stdnum stdnum/isbn tests,
Commits of the python-stdnum project