lists.arthurdejong.org
RSS feed

python-stdnum branch master updated. 1.13-27-gf3ce70c

[Date Prev][Date Next] [Thread Prev][Thread Next]

python-stdnum branch master updated. 1.13-27-gf3ce70c



This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "python-stdnum".

The branch, master has been updated
       via  f3ce70c60f26c5a7e0c0e05985630e3136b130fa (commit)
      from  54e2e8fda313c1cb47c4cfdc71f42be272fe74e4 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://arthurdejong.org/git/python-stdnum/commit/?id=f3ce70c60f26c5a7e0c0e05985630e3136b130fa

commit f3ce70c60f26c5a7e0c0e05985630e3136b130fa
Author: Leandro Regueiro <leandro.regueiro@gmail.com>
Date:   Sat Apr 4 22:06:52 2020 +0200

    Add support for Chinese TIN number
    
    Closes https://github.com/arthurdejong/python-stdnum/issues/207
    Closes https://github.com/arthurdejong/python-stdnum/pull/210

diff --git a/stdnum/cn/__init__.py b/stdnum/cn/__init__.py
index 78e0bf6..93a6f5e 100644
--- a/stdnum/cn/__init__.py
+++ b/stdnum/cn/__init__.py
@@ -19,3 +19,6 @@
 # 02110-1301 USA
 
 """Collection of China (PRC) numbers."""
+
+# Provide vat as an alias.
+from stdnum.cn import uscc as vat  # noqa: F401
diff --git a/stdnum/cn/uscc.py b/stdnum/cn/uscc.py
new file mode 100644
index 0000000..a04a553
--- /dev/null
+++ b/stdnum/cn/uscc.py
@@ -0,0 +1,125 @@
+# uscc.py - functions for handling Chinese USCC numbers
+# coding: utf-8
+#
+# Copyright (C) 2020 Leandro Regueiro
+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+# 02110-1301 USA
+
+"""USCC (Unified Social Credit Code, 统一社会信用代码, China tax number).
+
+This number consists of 18 digits or uppercase English letters (excluding the
+letters I, O, Z, S, V). The number is comprised of several parts:
+
+* Digit 1 represents the registering authority,
+* Digit 2 represents the registered entity type,
+* Digits 3 through 8 represent the registering region code,
+* Digits 9 through 17 represent the organisation code,
+* Digit 18 is a check digit (either a number or letter).
+
+The registering authority digit most often is a 9, which represents the State
+Administration for Industry and Commerce (SAIC) as the registering authority.
+
+The registered entity type indicates the type of business (or entity). The
+most common entity types in China are:
+
+* Wholly Foreign-Owned Enterprises (WFOE): 外商独资企业
+* Joint Ventures (JV): 合资
+* Representative Office: 代表处
+* State-Owned Enterprise (SOE): 国有企业
+* Private Enterprise: 民营企业
+* Individually-Owned: 个体户
+
+The registering region code, sometimes referred to as the "administrative
+division code", is a string of six numbers that indicates where the company
+is registered. It roughly follows the organisation of the official Chinese
+regions.
+
+The organisation code comes directly from the China Organization Code
+certificate, an alternative document to the China Business License. It can
+contain letters or digits.
+
+More information:
+
+* https://zh.wikipedia.org/wiki/统一社会信用代码
+* https://zh.wikipedia.org/wiki/校验码
+* 
https://www.oecd.org/tax/automatic-exchange/crs-implementation-and-assistance/tax-identification-numbers/China-TIN.pdf
+
+>>> validate('91110000600037341L')
+'91110000600037341L'
+>>> validate('A1110000600037341L')
+Traceback (most recent call last):
+    ...
+InvalidFormat: ...
+>>> validate('12345')
+Traceback (most recent call last):
+    ...
+InvalidLength: ...
+>>> format('9 1 110000 600037341L')
+'91110000600037341L'
+"""
+
+from stdnum.exceptions import *
+from stdnum.util import clean, isdigits
+
+
+_alphabet = '0123456789ABCDEFGHJKLMNPQRTUWXY'
+
+
+def compact(number):
+    """Convert the number to the minimal representation.
+
+    This strips the number of any valid separators and removes surrounding
+    whitespace.
+    """
+    return clean(number, ' -').upper().strip()
+
+
+def calc_check_digit(number):
+    """Calculate the check digit for the number."""
+    weights = (1, 3, 9, 27, 19, 26, 16, 17, 20, 29, 25, 13, 8, 24, 10, 30, 28)
+    number = compact(number)
+    total = sum(_alphabet.index(n) * w for n, w in zip(number, weights))
+    return _alphabet[(31 - total) % 31]
+
+
+def validate(number):
+    """Check if the number is a valid USCC.
+
+    This checks the length, formatting and check digit.
+    """
+    number = compact(number)
+    if len(number) != 18:
+        raise InvalidLength()
+    if not isdigits(number[:8]):
+        raise InvalidFormat()
+    if any(c not in _alphabet for c in number[8:]):
+        raise InvalidFormat()
+    if number[-1] != calc_check_digit(number):
+        raise InvalidChecksum()
+    return number
+
+
+def is_valid(number):
+    """Check if the number is a valid USCC."""
+    try:
+        return bool(validate(number))
+    except ValidationError:
+        return False
+
+
+def format(number):
+    """Reformat the number to the standard presentation format."""
+    return compact(number)
diff --git a/tests/test_cn_uscc.doctest b/tests/test_cn_uscc.doctest
new file mode 100644
index 0000000..9d34ebb
--- /dev/null
+++ b/tests/test_cn_uscc.doctest
@@ -0,0 +1,261 @@
+test_cn_uscc.doctest - more detailed doctests for stdnum.cn.uscc module
+
+Copyright (C) 2020 Leandro Regueiro
+
+This library is free software; you can redistribute it and/or
+modify it under the terms of the GNU Lesser General Public
+License as published by the Free Software Foundation; either
+version 2.1 of the License, or (at your option) any later version.
+
+This library is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+Lesser General Public License for more details.
+
+You should have received a copy of the GNU Lesser General Public
+License along with this library; if not, write to the Free Software
+Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+02110-1301 USA
+
+
+This file contains more detailed doctests for the stdnum.cn.uscc module. It
+tries to test more corner cases and detailed functionality that is not really
+useful as module documentation.
+
+>>> from stdnum.cn import uscc
+
+
+Tests for some corner cases.
+
+>>> uscc.validate('91110000600037341L')
+'91110000600037341L'
+>>> uscc.validate('91 110000 600037341L')
+'91110000600037341L'
+>>> uscc.format('91 110000 600037341L')
+'91110000600037341L'
+>>> uscc.validate('12345')
+Traceback (most recent call last):
+    ...
+InvalidLength: ...
+>>> uscc.validate('A1110000600037341L')
+Traceback (most recent call last):
+    ...
+InvalidFormat: ...
+>>> uscc.validate('9111000060003IOZSV')
+Traceback (most recent call last):
+    ...
+InvalidFormat: ...
+>>> uscc.validate('91110000600037341N')
+Traceback (most recent call last):
+    ...
+InvalidChecksum: ...
+
+
+These have been found online and should all be valid numbers.
+
+>>> numbers = '''
+...
+... 121200004013590816
+... 911522010783762860
+... 91152201078377449P
+... 9115220107837941X1
+... 91152201078382010M
+... 911522010783917502
+... 91152201078394644D
+... 91152201078395188Q
+... 91310112087994932F
+... 91310115MA1K3BTP2B
+... 91310116052958608G
+... 91310116076409427L
+... 913101203509708598
+... 913205056082348768
+... 91340600MA2PBM9HXD
+... 91340600MA2PBMA74B
+... 91340600MA2PBN188W
+... 91340600MA2PBN831R
+... 91340600MA2PBNE723
+... 91340600MA2PBP2120
+... 91340600MA2PBP6A1W
+... 91340600MA2PENCC7K
+... 91340600MA2PENWF9P
+... 91340600MA2PEP518T
+... 91340600MA2PEP7F5B
+... 91340600MA2PERLJ2D
+... 91340600MA2UJYG49B
+... 91340602MA2UJY3J9W
+... 91340602MA2UJYD90G
+... 91340603MA2UJYN78Q
+... 91340603MA2UJYNU67
+... 91340604MA2PBQBH3P
+... 91340604MA2PEL1X3M
+... 91340621MA2PBPLU08
+... 91340621MA2PEMUHX8
+... 91340621MA2PEN6Q3Y
+... 91340621MA2PEPJB9F
+... 91371082775260043P
+... 914112810713741814
+... 9144030071526726XG
+... 91620100719023721L
+... 91620105224521729E
+... 91620105556273987U
+... 91620105556284969Y
+... 91620105556287545F
+... 91620105571630591E
+... 91620105585923360D
+... 91620105665404165K
+... 9162010567592034XQ
+... 91620105750906995R
+... 91620105750935948X
+... 91620105756580826D
+... 91620105767728304H
+... 91620105784023285J
+... 91620105789620900B
+... 91620105794891756D
+... 91620105L015101280
+... 92340602MA2PBQAH7E
+... 92340602MA2PEL3T3X
+... 92340602MA2PEL605A
+... 92340602MA2UK2TH6F
+... 92340602MA2UK33127
+... 92340603MA2PBMCG0U
+... 92340603MA2PBMD304
+... 92340603MA2PBMQL04
+... 92340603MA2PBMUF7M
+... 92340603MA2PBN17X5
+... 92340603MA2PBN2W0N
+... 92340603MA2PBNK461
+... 92340603MA2PBNLR0R
+... 92340603MA2PBQ7J4K
+... 92340603MA2PBQ7X9E
+... 92340603MA2PBQFT7J
+... 92340603MA2PBQGB6M
+... 92340603MA2PBQHD9N
+... 92340603MA2PBQUC3U
+... 92340603MA2PBQX04X
+... 92340603MA2PBQY862
+... 92340603MA2PEL613K
+... 92340603MA2PELE65C
+... 92340603MA2PELFF5B
+... 92340603MA2PELHL72
+... 92340603MA2PEM480D
+... 92340603MA2PEN9F2E
+... 92340603MA2PENAP06
+... 92340603MA2PENCQ1M
+... 92340603MA2PEP42XF
+... 92340603MA2PEPBN6L
+... 92340603MA2PEPM83M
+... 92340603MA2UJXYJ3P
+... 92340603MA2UJY0W68
+... 92340603MA2UJY2M7F
+... 92340603MA2UJYUM57
+... 92340603MA2UK0KQ72
+... 92340603MA2UK0ME1G
+... 92340603MA2UK0T79E
+... 92340603MA2UK1NC4L
+... 92340603MA2UK1PX9B
+... 92340603MA2UK1TD00
+... 92340603MA2UK1UE56
+... 92340603MA2UK28G4G
+... 92340603MA2UK2P79U
+... 92340603MA2UK2WX6M
+... 92340603MA2UK2XD94
+... 92340603MA2UK31A3N
+... 92340603MA2UK3CP6U
+... 92340603MA2UK3D673
+... 92340603MA2UK3DF0N
+... 92340603MA2UK3DT5H
+... 92340604MA2PBME53K
+... 92340604MA2PBMQK2P
+... 92340604MA2PBMXF6C
+... 92340604MA2PBNDT64
+... 92340604MA2PBNKR4Y
+... 92340604MA2PBNQNXM
+... 92340604MA2PBNR83C
+... 92340604MA2PBNUL9N
+... 92340604MA2PBNWH99
+... 92340604MA2PBNXH5J
+... 92340604MA2PBP1U3Y
+... 92340604MA2PBP7U1T
+... 92340604MA2PBPHC8R
+... 92340604MA2PBPXJ71
+... 92340604MA2PEM4L7X
+... 92340604MA2PEM5Q4E
+... 92340604MA2PENA300
+... 92340604MA2UJYCE59
+... 92340604MA2UJYPN1K
+... 92340604MA2UK03G7D
+... 92340604MA2UK04L4X
+... 92340604MA2UK07529
+... 92340604MA2UK08Y65
+... 92340621MA2PBKRXXG
+... 92340621MA2PBL318W
+... 92340621MA2PBLW63G
+... 92340621MA2PBLWT1Y
+... 92340621MA2PBLX27E
+... 92340621MA2PBM6286
+... 92340621MA2PBMA318
+... 92340621MA2PBMKK45
+... 92340621MA2PBMNMXM
+... 92340621MA2PBMUH3D
+... 92340621MA2PBN0KXX
+... 92340621MA2PBN145R
+... 92340621MA2PBN1962
+... 92340621MA2PBN1H1T
+... 92340621MA2PBN559X
+... 92340621MA2PBNFH07
+... 92340621MA2PBNJ232
+... 92340621MA2PBNTY9Y
+... 92340621MA2PBP431L
+... 92340621MA2PELD34J
+... 92340621MA2PELFR3H
+... 92340621MA2UJRKE7N
+... 92340621MA2UJRL762
+... 92340621MA2UJT9WXP
+... 92340621MA2UJTCK0M
+... 92340621MA2UJTM785
+... 92340621MA2UJTQ22E
+... 92340621MA2UJTTF8R
+... 92340621MA2UJTY74U
+... 92340621MA2UJW4UXQ
+... 92340621MA2UJW690P
+... 92340621MA2UJW703D
+... 92340621MA2UJWA961
+... 92340621MA2UJWCT2H
+... 92340621MA2UJWEE2G
+... 92340621MA2UJWFB49
+... 92340621MA2UJWGB0J
+... 92340621MA2UJWYPX8
+... 92340621MA2UJX6C8T
+... 92340621MA2UJX968M
+... 92340621MA2UJXLD1M
+... 92340621MA2UJY477E
+... 92340621MA2UJYE11E
+... 92340621MA2UJYG57G
+... 92340621MA2UK02XXW
+... 92340621MA2UK08G9K
+... 92340621MA2UK0DA11
+... 92340621MA2UK0G256
+... 92340621MA2UK0J08Q
+... 92340621MA2UK0JX8N
+... 92340621MA2UK0KR5Y
+... 92340621MA2UK1JY92
+... 92340621MA2UK1L38E
+... 92340621MA2UK1MK35
+... 92340621MA2UK1MN8P
+... 92340621MA2UK1PRX7
+... 92340621MA2UK22G6N
+... 92340621MA2UK25H3F
+... 92340621MA2UK2BU8B
+... 92340621MA2UK2CD59
+... 92340621MA2UK2EP62
+... 92340621MA2UK2GJXF
+... 92340621MA2UK2Q24Y
+... 92340621MA2UK2XX21
+... 92340621MA2UK30C35
+... 92340621MA2UK338XJ
+... 92340621MA2UK3C369
+... 92340621MA2UK3CE6L
+...
+... '''
+>>> [x for x in numbers.splitlines() if x and not uscc.is_valid(x)]
+[]

-----------------------------------------------------------------------

Summary of changes:
 stdnum/cn/__init__.py      |   3 +
 stdnum/cn/uscc.py          | 125 ++++++++++++++++++++++
 tests/test_cn_uscc.doctest | 261 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 389 insertions(+)
 create mode 100644 stdnum/cn/uscc.py
 create mode 100644 tests/test_cn_uscc.doctest


hooks/post-receive
-- 
python-stdnum