lists.arthurdejong.org
RSS feed

python-stdnum branch master updated. 0.7-62-g46a7996

[Date Prev][Date Next] [Thread Prev][Thread Next]

python-stdnum branch master updated. 0.7-62-g46a7996



This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "python-stdnum".

The branch, master has been updated
       via  46a7996904663a1a2e1544256bef46f04c3d14df (commit)
      from  999f2c38f6fdef19254e505db35c9ab722a9f1af (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://arthurdejong.org/git/python-stdnum/commit/?id=46a7996904663a1a2e1544256bef46f04c3d14df

commit 46a7996904663a1a2e1544256bef46f04c3d14df
Author: Arthur de Jong <arthur@arthurdejong.org>
Date:   Sat Jun 8 15:37:56 2013 +0200

    Add a Malaysian NRIC No. module
    
    NRIC No. (National Registration Identity Card Number) is the unique
    identifier for issued to Malaysian citizens and permanent residents.

diff --git a/getmybp.py b/getmybp.py
new file mode 100755
index 0000000..3f84924
--- /dev/null
+++ b/getmybp.py
@@ -0,0 +1,87 @@
+#!/usr/bin/env python
+
+# getmybp.py - script to donwnload data from Malaysian government site
+#
+# Copyright (C) 2013 Arthur de Jong
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+# 02110-1301 USA
+
+from collections import defaultdict
+import re
+import urllib
+
+import BeautifulSoup
+
+
+# URLs that are downloaded
+state_list_url = 'http://www.jpn.gov.my/en/informasi/states-code'
+country_list_url = 'http://www.jpn.gov.my/en/informasi/country-code'
+
+
+spaces_re = re.compile('\s+', re.UNICODE)
+
+
+def clean(s):
+    """Cleans up the string removing unneeded stuff from it."""
+    return spaces_re.sub(' ', s.replace(u'\u0096', '')).strip().encode('utf-8')
+
+
+def parse(f):
+    """Parse the specified file."""
+    soup = BeautifulSoup.BeautifulSoup(f, convertEntities='html')
+    # find all table rows
+    for tr in soup.find('div', id='content').findAll('tr'):
+        # find the rows with four columns of text
+        tds = [
+            clean(''.join(x.string for x in td.findAll(text=True)))
+            for td in tr.findAll('td')
+        ]
+        if len(tds) >= 2 and tds[0] and tds[1]:
+            yield tds[0], tds[1]
+        if len(tds) >= 4 and tds[2] and tds[3]:
+            yield tds[2], tds[3]
+
+
+if __name__ == '__main__':
+    results = defaultdict(lambda : defaultdict(list))
+    # read the states
+    #f = open('/tmp/states.html', 'r')
+    f = urllib.urlopen(state_list_url)
+    for state, bps in parse(f):
+        for bp in bps.split(','):
+            results[bp.strip()]['state'] = state
+            results[bp.strip()]['countries'].append('Malaysia')
+    # read the countries
+    #f = open('/tmp/countries.html', 'r')
+    f = urllib.urlopen(country_list_url)
+    for country, bp in parse(f):
+        results[bp]['countries'].append(country)
+    # print the results
+    print '# generated from National Registration Department of Malaysia, 
downloaded from'
+    print '# %s' % state_list_url
+    print '# %s' % country_list_url
+    print
+    for bp in sorted(results.iterkeys()):
+        res = bp
+        row = results[bp]
+        if 'state' in row:
+            res += ' state="%s"' % row['state']
+        countries = row['countries']
+        if len(countries) == 1:
+            res += ' country="%s"' % countries[0]
+        if len(countries) > 0:
+            res += ' countries="%s"' % (', '.join(countries))
+        print res
diff --git a/stdnum/my/__init__.py b/stdnum/my/__init__.py
new file mode 100644
index 0000000..e20908e
--- /dev/null
+++ b/stdnum/my/__init__.py
@@ -0,0 +1,21 @@
+# __init__.py - collection of Malaysian numbers
+# coding: utf-8
+#
+# Copyright (C) 2013 Arthur de Jong
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+# 02110-1301 USA
+
+"""Collection of Malaysian numbers."""
diff --git a/stdnum/my/bp.dat b/stdnum/my/bp.dat
new file mode 100644
index 0000000..40231fd
--- /dev/null
+++ b/stdnum/my/bp.dat
@@ -0,0 +1,86 @@
+# generated from National Registration Department of Malaysia, downloaded from
+# http://www.jpn.gov.my/en/informasi/states-code
+# http://www.jpn.gov.my/en/informasi/country-code
+
+01 state="Johor" country="Malaysia" countries="Malaysia"
+02 state="Kedah" country="Malaysia" countries="Malaysia"
+03 state="Kelantan" country="Malaysia" countries="Malaysia"
+04 state="Melaka" country="Malaysia" countries="Malaysia"
+05 state="Negeri Sembilan" country="Malaysia" countries="Malaysia"
+06 state="Pahang" country="Malaysia" countries="Malaysia"
+07 state="Pulau Pinang" country="Malaysia" countries="Malaysia"
+08 state="Perak" country="Malaysia" countries="Malaysia"
+09 state="Perlis" country="Malaysia" countries="Malaysia"
+10 state="Selangor" country="Malaysia" countries="Malaysia"
+11 state="Terengganu" country="Malaysia" countries="Malaysia"
+12 state="Sabah" country="Malaysia" countries="Malaysia"
+13 state="Sarawak" country="Malaysia" countries="Malaysia"
+14 state="Wilayah Persekutuan (Kuala Lumpur)" country="Malaysia" 
countries="Malaysia"
+15 state="Wilayah Persekutuan (Labuan)" country="Malaysia" countries="Malaysia"
+16 state="Wilayah Persekutuan (Putrajaya)" country="Malaysia" 
countries="Malaysia"
+21 state="Johor" country="Malaysia" countries="Malaysia"
+22 state="Johor" country="Malaysia" countries="Malaysia"
+23 state="Johor" country="Malaysia" countries="Malaysia"
+24 state="Johor" country="Malaysia" countries="Malaysia"
+25 state="Kedah" country="Malaysia" countries="Malaysia"
+26 state="Kedah" country="Malaysia" countries="Malaysia"
+27 state="Kedah" country="Malaysia" countries="Malaysia"
+28 state="Kelantan" country="Malaysia" countries="Malaysia"
+29 state="Kelantan" country="Malaysia" countries="Malaysia"
+30 state="Melaka" country="Malaysia" countries="Malaysia"
+31 state="Negeri Sembilan" country="Malaysia" countries="Malaysia"
+32 state="Pahang" country="Malaysia" countries="Malaysia"
+33 state="Pahang" country="Malaysia" countries="Malaysia"
+34 state="Pulau Pinang" country="Malaysia" countries="Malaysia"
+35 state="Pulau Pinang" country="Malaysia" countries="Malaysia"
+36 state="Perak" country="Malaysia" countries="Malaysia"
+37 state="Perak" country="Malaysia" countries="Malaysia"
+38 state="Perak" country="Malaysia" countries="Malaysia"
+39 state="Perak" country="Malaysia" countries="Malaysia"
+40 state="Perlis" country="Malaysia" countries="Malaysia"
+41 state="Selangor" country="Malaysia" countries="Malaysia"
+42 state="Selangor" country="Malaysia" countries="Malaysia"
+43 state="Selangor" country="Malaysia" countries="Malaysia"
+44 state="Selangor" country="Malaysia" countries="Malaysia"
+45 state="Terengganu" country="Malaysia" countries="Malaysia"
+46 state="Terengganu" country="Malaysia" countries="Malaysia"
+47 state="Sabah" country="Malaysia" countries="Malaysia"
+48 state="Sabah" country="Malaysia" countries="Malaysia"
+49 state="Sabah" country="Malaysia" countries="Malaysia"
+50 state="Sarawak" country="Malaysia" countries="Malaysia"
+51 state="Sarawak" country="Malaysia" countries="Malaysia"
+52 state="Sarawak" country="Malaysia" countries="Malaysia"
+53 state="Sarawak" country="Malaysia" countries="Malaysia"
+54 state="Wilayah Persekutuan (Kuala Lumpur)" country="Malaysia" 
countries="Malaysia"
+55 state="Wilayah Persekutuan (Kuala Lumpur)" country="Malaysia" 
countries="Malaysia"
+56 state="Wilayah Persekutuan (Kuala Lumpur)" country="Malaysia" 
countries="Malaysia"
+57 state="Wilayah Persekutuan (Kuala Lumpur)" country="Malaysia" 
countries="Malaysia"
+58 state="Wilayah Persekutuan (Labuan)" country="Malaysia" countries="Malaysia"
+59 state="Negeri Sembilan" country="Malaysia" countries="Malaysia"
+60 country="Brunei" countries="Brunei"
+61 country="Indonesia" countries="Indonesia"
+62 countries="Cambodia, Kampuchea"
+63 country="Laos" countries="Laos"
+64 country="Mynmar" countries="Mynmar"
+65 country="Filipina" countries="Filipina"
+66 country="Singapura" countries="Singapura"
+67 country="Thailand" countries="Thailand"
+68 country="Vietnam" countries="Vietnam"
+74 country="China" countries="China"
+75 country="India" countries="India"
+76 country="Pakistan" countries="Pakistan"
+77 country="Arab Saudi" countries="Arab Saudi"
+78 country="Sri Lanka" countries="Sri Lanka"
+79 country="Bangladesh" countries="Bangladesh"
+82 state="Negeri Tidak Diketahui" country="Malaysia" countries="Malaysia"
+83 countries="Australia, American Samoa, Macedonia, New Zealand, New 
Caledonia, Papua New Gurney, Fiji, Timor Leste"
+84 countries="Argentina, Anguilla, Aruba, Bolivia, Brazil, Paraguay, Peru, 
Chile, Colombia, Equador, Uruguay, Venezuela"
+85 countries="Algeria, Angola, Kenya, Afrika Tengah, Liberia, Afrika Selatan, 
Mali, Mauritania, Morocco, Malawi, Botswana, Mozambique, Burundi, Nigeria, 
Namibia, Cameroon, Chad, Rwanda, Senegal, Sierra Leone, Somalia, Djibouti, 
Sudan, Egypt, Ethopia, Swaziland, Eritrea, Gambia, Ghana, Tunisia, Tanzania, 
Tonga, Togo, Uganda, Zaire, Zambia, Zimbabwe"
+86 countries="Austria, Luxembourg, Armenia, Malta, Monaco, Belgium, 
Nitherlands, Norway, Cyprus, Portugal, Denmark, Sweeden, Spain, Switzerland, 
France, Finland, Slovakia, Slovenia, Greece, Germany, Holy See (Vatican City), 
Italy"
+87 countries="Britain, Ireland"
+88 countries="Jordan, Kuwait, Lebanon, Bahrain, Oman, Qatar, Syria, Turkey, 
United Arab Emirate, Iran, Iraq, Israel, Yemen"
+89 countries="Japan, Korea Selatan, Korea Utara, Taiwan"
+90 countries="Jamaica, Bahamas, Barbados, Belize, Mexico, Nicaragua, Panama, 
Puerto Rico, Costa Rica, Cuba, Dominica, El Salvador, Grenada, Guatemala, 
Trinidad&Tobado, Haiti, Honduras"
+91 countries="Canada, Greenland, United State"
+92 countries="Albania, Albania, Latvia, Lithuania, Bulgaria, Byelorussia, 
Bosnia, Belarus, Poland, Romania, Russia, Czechoslovakia, Crotia, Esthonia, 
Serbia, Georgia, Hungary, Ukraine"
+93 countries="Afghanistan, Antigua & Barbuda, Kazakhstan, Andorra/Andora, 
Libya, Arzebaijan, Antartica, Maldives, Madagascar, Mauritius, Mongolia, Benin, 
Maghribi, Bhutan, Macau, Nepal, Bermuda, Burkina faso/Burkina, Bora-bora, 
Bouvet Island, Palestine, Cape Verde, Comoros, Seychelles, Soloman Islands, 
Samoa, San Marino, Guinea, Gibraltar, Tajikistan, Tukmenistan, Hong Kong, 
Uzbekistan, Ivory Coast, Vanuatu, Iceland, Yugoslavia"
diff --git a/stdnum/my/nric.py b/stdnum/my/nric.py
new file mode 100644
index 0000000..810d20e
--- /dev/null
+++ b/stdnum/my/nric.py
@@ -0,0 +1,110 @@
+# nric.py - functions for handling  NRIC numbers
+#
+# Copyright (C) 2013 Arthur de Jong
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+# 02110-1301 USA
+
+"""NRIC No. (National Registration Identity Card Number) is the unique
+identifier for issued to Malaysian citizens and permanent residents.
+
+The number consist of 12 digits in three sections. The first 6 digits
+represent the birth date, followed by two digits represeting the birth
+place and finally four digits. The gender of a person can be derived from
+the last digit: odd numbers for males and even numbers for females.
+
+>>> validate('770305-02-1234')
+'770305021234'
+>>> validate('771305-02-1234')  # invalid date
+Traceback (most recent call last):
+    ...
+InvalidComponent: ...
+>>> validate('770305-99-1234')  # unknown birth place code
+Traceback (most recent call last):
+    ...
+InvalidComponent: ...
+>>> format('770305021234')
+'770305-02-1234'
+"""
+
+import datetime
+
+from stdnum.exceptions import *
+from stdnum.util import clean
+
+
+def compact(number):
+    """Convert the number to the minimal representation. This strips the
+    number of any valid separators and removes surrounding whitespace."""
+    return clean(number, ' -*').strip()
+
+
+def get_birth_date(number):
+    """Split the date parts from the number and return the birth date.
+    Note that in some cases it may return the registration date instead of
+    the birth date and it may be a century off."""
+    number = compact(number)
+    year = int(number[0:2])
+    month = int(number[2:4])
+    day = int(number[4:6])
+    # this is a bit broken but it's easy
+    try:
+        return datetime.date(year + 1900, month, day)
+    except ValueError:
+        pass
+    try:
+        return datetime.date(year + 2000, month, day)
+    except ValueError:
+        raise InvalidComponent()
+
+
+def get_birth_place(number):
+    """Use the number to look up the place of birth of the person. This can
+    either be a state or federal territory within Malaysia or a country
+    outside of Malaysia."""
+    from stdnum import numdb
+    number = compact(number)
+    results = numdb.get('my/bp').info(number[6:8])[0][1]
+    if not results:
+        raise InvalidComponent()
+    return results
+
+
+def validate(number):
+    """Checks to see if the number provided is a valid NRIC numbers. This
+    checks the length, formatting and birth date and place."""
+    number = compact(number)
+    if len(number) != 12:
+        raise InvalidLength()
+    if not number.isdigit():
+        raise InvalidFormat()
+    get_birth_date(number)
+    get_birth_place(number)
+    return number
+
+
+def is_valid(number):
+    """Checks to see if the number provided is a valid NRIC numbers. This
+    checks the length, formatting and birth date and place."""
+    try:
+        return bool(validate(number))
+    except ValidationError:
+        return False
+
+
+def format(number):
+    """Reformat the passed number to the standard format."""
+    number = compact(number)
+    return number[:6] + '-' + number[6:8] + '-' + number[8:]
diff --git a/tests/test_my_nric.doctest b/tests/test_my_nric.doctest
new file mode 100644
index 0000000..91aa2aa
--- /dev/null
+++ b/tests/test_my_nric.doctest
@@ -0,0 +1,126 @@
+test_my_nric.doctest - more detailed doctests for stdnum.my.nric module
+
+Copyright (C) 2013 Arthur de Jong
+
+This library is free software; you can redistribute it and/or
+modify it under the terms of the GNU Lesser General Public
+License as published by the Free Software Foundation; either
+version 2.1 of the License, or (at your option) any later version.
+
+This library is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+Lesser General Public License for more details.
+
+You should have received a copy of the GNU Lesser General Public
+License along with this library; if not, write to the Free Software
+Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+02110-1301 USA
+
+
+This file contains more detailed doctests for the stdnum.my.nric. It
+tries to cover more corner cases and detailed functionality that is not
+really useful as module documentation.
+
+>>> from stdnum.my import nric
+>>> from stdnum.exceptions import *
+
+
+Normal values that should just work.
+
+>>> nric.validate('770305-02-1234')
+'770305021234'
+>>> nric.validate('890131-06-1224')
+'890131061224'
+>>> nric.validate('810909785542')
+'810909785542'
+>>> nric.validate('880229875542')
+'880229875542'
+
+
+Get the birth date:
+
+>>> nric.get_birth_date('770305-02-1234')
+datetime.date(1977, 3, 5)
+>>> nric.get_birth_date('890131-06-1224')
+datetime.date(1989, 1, 31)
+>>> nric.get_birth_date('810909785542')
+datetime.date(1981, 9, 9)
+>>> nric.get_birth_date('880229875542')
+datetime.date(1988, 2, 29)
+
+
+Get the birth place:
+
+>>> str(nric.get_birth_place('770305-02-1234')['state'])
+'Kedah'
+>>> str(nric.get_birth_place('890131-06-1224')['state'])
+'Pahang'
+>>> str(nric.get_birth_place('810909785542')['country'])
+'Sri Lanka'
+>>> str(nric.get_birth_place('880229875542')['countries'])
+'Britain, Ireland'
+
+
+Formatting:
+
+>>> nric.format('770305-02-1234')
+'770305-02-1234'
+>>> nric.format('890131-06-1224')
+'890131-06-1224'
+>>> nric.format('810909785542')
+'810909-78-5542'
+>>> nric.format('880229875542')
+'880229-87-5542'
+
+
+Invalid date:
+
+>>> nric.validate('771305-02-1234')
+Traceback (most recent call last):
+    ...
+InvalidComponent: ...
+>>> nric.validate('890132-06-1224')
+Traceback (most recent call last):
+    ...
+InvalidComponent: ...
+>>> nric.validate('870229875542')
+Traceback (most recent call last):
+    ...
+InvalidComponent: ...
+
+
+Invalid birth place:
+
+>>> nric.validate('770305-00-1234')
+Traceback (most recent call last):
+    ...
+InvalidComponent: ...
+>>> nric.validate('890131-17-1224')
+Traceback (most recent call last):
+    ...
+InvalidComponent: ...
+>>> nric.validate('810909805542')
+Traceback (most recent call last):
+    ...
+InvalidComponent: ...
+>>> nric.validate('880229995542')
+Traceback (most recent call last):
+    ...
+InvalidComponent: ...
+
+
+Just invalid numbers:
+
+>>> nric.validate('770305-00')
+Traceback (most recent call last):
+    ...
+InvalidLength: ...
+>>> nric.validate('890A31-17-1224')
+Traceback (most recent call last):
+    ...
+InvalidFormat: ...
+>>> nric.get_birth_place('8109098')
+Traceback (most recent call last):
+    ...
+InvalidComponent: ...

-----------------------------------------------------------------------

Summary of changes:
 getmybp.py                    |   87 ++++++++++++++++++++++++++++
 stdnum/{bg => my}/__init__.py |    6 +-
 stdnum/my/bp.dat              |   86 ++++++++++++++++++++++++++++
 stdnum/my/nric.py             |  110 +++++++++++++++++++++++++++++++++++
 tests/test_my_nric.doctest    |  126 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 412 insertions(+), 3 deletions(-)
 create mode 100755 getmybp.py
 copy stdnum/{bg => my}/__init__.py (85%)
 create mode 100644 stdnum/my/bp.dat
 create mode 100644 stdnum/my/nric.py
 create mode 100644 tests/test_my_nric.doctest


hooks/post-receive
-- 
python-stdnum
-- 
To unsubscribe send an email to
python-stdnum-commits-unsubscribe@lists.arthurdejong.org or see
http://lists.arthurdejong.org/python-stdnum-commits/