RSS feed

python-stdnum branch master updated. 1.11-17-g51e00da

[Date Prev][Date Next] [Thread Prev][Thread Next]

python-stdnum branch master updated. 1.11-17-g51e00da

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "python-stdnum".

The branch, master has been updated
       via  51e00da36a647589921d1ea1bf8356f467964ea1 (commit)
      from  5d0f288eb18793048fa10f9d9de82df087f7e71e (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------

commit 51e00da36a647589921d1ea1bf8356f467964ea1
Author: Arthur de Jong <>
Date:   Fri Jun 14 14:26:01 2019 +0200

    Fix handelsregisternummer to not turn Hamburg into Homburg
    This changes the minimisation function that is used for comparison and
    canonicalisation to not reduce Hamburg and Homburg to the same string.
    This makes the function slightly more strict in which encoding
    differences to accept.
    This also adds a few aliases to the court names.

diff --git a/stdnum/de/ 
index 540d394..5d7de4e 100644
--- a/stdnum/de/
+++ b/stdnum/de/
@@ -50,9 +50,10 @@ InvalidComponent: ...
 import re
+import unicodedata
 from stdnum.exceptions import *
-from stdnum.util import clean
+from stdnum.util import clean, to_unicode
 # The known courts that have a Handelsregister
@@ -214,8 +215,8 @@ GERMAN_COURTS = (
 def _to_min(court):
     """Convert the court name for quick comparison without encoding issues."""
     return ''.join(
-        x for x in court.lower()
-        if x in 'bcdefghijklmnpqrstvwxyz')
+        x for x in unicodedata.normalize('NFD', to_unicode(court).lower())
+        if x in 'abcdefghijklmnopqrstuvwxyz')
 # Build a dictionary for lookup up courts
@@ -223,10 +224,20 @@ _courts = dict(
     (_to_min(court), court) for court in GERMAN_COURTS)
     (_to_min(alias), court) for alias, court in (
+        ('Allgäu', 'Kempten (Allgäu)'),
         ('Bad Homburg', 'Bad Homburg v.d.H.'),
         ('Berlin', 'Berlin (Charlottenburg)'),
         ('Charlottenburg', 'Berlin (Charlottenburg)'),
+        ('Kaln', 'Köln'),  # for encoding issues
+        ('Kempten', 'Kempten (Allgäu)'),
+        ('Ludwigshafen am Rhein (Ludwigshafen)', 'Ludwigshafen a.Rhein 
+        ('Ludwigshafen am Rhein', 'Ludwigshafen a.Rhein (Ludwigshafen)'),
+        ('Ludwigshafen', 'Ludwigshafen a.Rhein (Ludwigshafen)'),
         ('Oldenburg', 'Oldenburg (Oldenburg)'),
+        ('St. Ingbert', 'St. Ingbert (St Ingbert)'),
+        ('St. Wendel', 'St. Wendel (St Wendel)'),
+        ('Weiden in der Oberpfalz', 'Weiden i. d. OPf.'),
+        ('Weiden', 'Weiden i. d. OPf.'),
diff --git a/tests/test_de_handelsregisternummer.doctest 
index e6143ea..b75a1db 100644
--- a/tests/test_de_handelsregisternummer.doctest
+++ b/tests/test_de_handelsregisternummer.doctest
@@ -1,6 +1,6 @@
 test_de_handelsregisternummer.doctest - tests for German register number
-Copyright (C) 2018 Arthur de Jong
+Copyright (C) 2018-2019 Arthur de Jong
 This library is free software; you can redistribute it and/or
 modify it under the terms of the GNU Lesser General Public
@@ -52,6 +52,8 @@ funky so they work both in Python 2 and Python 3.
 >>> handelsregisternummer.validate('Berlin HRB 11223 B')  # Charlottenburg 
 >>> missing
 'Berlin (Charlottenburg) HRB 11223 B'
+>>> handelsregisternummer.validate('St. Ingbert HRA 61755')
+'St. Ingbert (St Ingbert) HRA 61755'
 >>> number = u'K\xf6ln HRB 49263'  # Unicode
 >>> handelsregisternummer.validate(number) == number
@@ -68,6 +70,10 @@ True
 Traceback (most recent call last):
 InvalidComponent: ...
+>>> handelsregisternummer.validate('Hamburg HRA 61755')
+'Hamburg HRA 61755'
+>>> handelsregisternummer.validate('Homburg HRA 61755')
+'Homburg HRA 61755'
 The compact function does minimal validation.


Summary of changes:
 stdnum/de/          | 17 ++++++++++++++---
 tests/test_de_handelsregisternummer.doctest |  8 +++++++-
 2 files changed, 21 insertions(+), 4 deletions(-)