lists.arthurdejong.org
RSS feed

python-stdnum branch master updated. 1.18-5-gb1dc313

[Date Prev][Date Next] [Thread Prev][Thread Next]

python-stdnum branch master updated. 1.18-5-gb1dc313



This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "python-stdnum".

The branch, master has been updated
       via  b1dc3137e8aa6cfd0435e7bf758588171f97cfa0 (commit)
      from  df894c37e9b28d639df5287eb98c6a01b47104d2 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://arthurdejong.org/git/python-stdnum/commit/?id=b1dc3137e8aa6cfd0435e7bf758588171f97cfa0

commit b1dc3137e8aa6cfd0435e7bf758588171f97cfa0
Author: Arthur de Jong <arthur@arthurdejong.org>
Date:   Fri Dec 30 16:39:36 2022 +0100

    Add initial CONTRIBUTING.md file
    
    Initial description of the information needed for adding new number
    formats and some coding and testing guidelines.

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 0000000..f6ba7a3
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,160 @@
+Contributing to python-stdnum
+=============================
+
+This document describes general guidelines for contributing new formats or
+other enhancement to python-stdnum.
+
+
+Adding number formats
+---------------------
+
+Basically any number or code that has some validation mechanism available or
+some common formatting is eligible for inclusion into this library. If the
+only specification of the number is "it consists of 6 digits" implementing
+validation may not be that useful.
+
+Contributions of new formats or requests to implement validation for a format
+should include the following:
+
+* The format name and short description.
+* References to (official) sources that describe the format.
+* A one or two paragraph description containing more details of the number
+  (e.g. purpose and issuer and possibly format information that might be
+  useful to end users).
+* If available, a link to an (official) validation service for the number,
+  reference implementations or similar sources that allow validating the
+  correctness of the implementation.
+* A set of around 20 to 100 "real" valid numbers for testing (more is better
+  during development but only around 100 will be retained for regression
+  testing).
+* If the validation depends on some (online) list of formats, structures or
+  parts of the identifier (e.g. a list of region codes that are part of the
+  number) a way to easily update the registry information should be
+  available.
+
+
+Code contributions
+------------------
+
+Improvements to python-stdnum are most welcome. Integrating contributions
+will be done on a best-effort basis and can be made easier if the following
+are considered:
+
+* Ideally contributions are made as GitHub pull requests, but contributions
+  by email (privately or through the python-stdnum-users mailing list) can
+  also be considered.
+* Submitted contributions will often be reformatted and sometimes
+  restructured for consistency with other parts.
+* Contributions will be acknowledged in the release notes.
+* Contributions should add or update a copyright statement if you feel the
+  contribution is significant.
+* All contribution should be made with compatible applicable copyright.
+* It is not needed to modify the NEWS, README.md or files under docs for new
+  formats; these files will be updated on release.
+* Marking valid numbers as invalid should be avoided and are much worse than
+  marking invalid numbers as valid. Since the primary use case for
+  python-stdnum is to validate entered data having an implementation that
+  results in "computer says no" should be avoided.
+* Number format implementations should include links to sources of
+  information: generally useful links (e.g. more details about the number
+  itself) should be in the module docstring, if it relates more to the
+  implementation (e.g. pointer to reference implementation, online API
+  documentation or similar) a comment in the code is better
+* Country-specific numbers and codes go in a country or region package (e.g.
+  stdnum.eu.vat or stdnum.nl.bsn) while global numbers go in the toplevel
+  name space (e.g. stdnum.isbn).
+* All code should be well tested and achieve 100% code coverage.
+* Existing code structure conventions (e.g. see README for interface) should
+  be followed.
+* Git commit messages should follow the usual 7 rules.
+* Declarative or functional constructs are preferred over an iterative
+  approach, e.g.::
+
+      s = sum(int(c) for c in number)
+
+  over::
+
+      s = 0
+      for c in number:
+          s += int(c)
+
+
+Testing
+-------
+
+Tests can be run with `tox`. Some basic code style tests can be run with `tox
+-e flake8` and most other targets run the test suite with various supported
+Python interpreters.
+
+Module implementations have a couple of smaller test cases that also serve as
+basic documentation of the happy flow.
+
+More extensive tests are available, per module, in the tests directory. These
+tests (also doctests) cover more corner cases and should include a set of
+valid numbers that demonstrate that the module works correctly for real
+numbers.
+
+The normal tests should never require online sources for execution. All
+functions that deal with online lookups (e.g. the EU VIES service for VAT
+validation) should only be tested using conditional unittests.
+
+
+Finding test numbers
+--------------------
+
+Some company numbers are commonly published on a company's website contact
+page (e.g. VAT or other registration numbers, bank account numbers). Doing a
+web search limited to a country and some key words generally turn up a lot of
+pages with this information.
+
+Another approach is to search for spreadsheet-type documents with some
+keywords that match the number. This sometimes turns up lists of companies
+(also occasionally works for personal identifiers).
+
+For information that is displayed on ID cards or passports it is sometimes
+useful to do an image search.
+
+For dealing with numbers that point to individuals it is important to:
+
+* Only keep the data that is needed to test the implementation.
+* Ensure that no actual other data relation to a person or other personal
+  information is kept or can be inferred from the kept data.
+* The presence of a number in the test set should not provide any information
+  about the person (other than that there is a person with the number or
+  information that is present in the number itself).
+
+Sometimes numbers are part of a data leak. If this data is used to pick a few
+sample numbers from the selection should be random and the leak should not be
+identifiable from the picked numbers. For example, if the leaked numbers
+pertain only to people with a certain medical condition, membership of some
+organisation or other specific property the leaked data should not be used.
+
+
+Reverse engineering
+-------------------
+
+Sometimes a number format clearly has a check digit but the algorithm is not
+publicly documented. It is sometimes possible to reverse engineer the used
+check digit algorithm from a large set of numbers.
+
+For example, given numbers that, apart from the check digit, only differ in
+one digit will often expose the weights used. This works reasonably well if
+the algorithm uses modulo 11 is over a weighted sums over the digits.
+
+See 
https://github.com/arthurdejong/python-stdnum/pull/203#issuecomment-623188812
+
+
+Registries
+----------
+
+Some numbers or parts of numbers use validation base on a registry of known
+good prefixes, ranges or formats. It is only useful to fully base validation
+on these registries if the update frequency to these registries is very low.
+
+If there is a registry that is used (a list of known values, ranges or
+otherwise) the downloaded information should be stored in a data file (see
+the stdnum.numdb module). Only the minimal amount of data should be kept (for
+validation or identification).
+
+The data files should be able to be created and updated using a script in the
+`update` directory.
diff --git a/docs/contributing.rst b/docs/contributing.rst
new file mode 100644
index 0000000..58977a8
--- /dev/null
+++ b/docs/contributing.rst
@@ -0,0 +1 @@
+.. include:: ../CONTRIBUTING.md
diff --git a/docs/index.rst b/docs/index.rst
index e48b450..c7e5c09 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -332,3 +332,12 @@ Changes in python-stdnum
    :maxdepth: 2
 
    changes
+
+
+Contributing to python-stdnum
+-----------------------------
+
+.. toctree::
+   :maxdepth: 2
+
+   contributing

-----------------------------------------------------------------------

Summary of changes:
 CONTRIBUTING.md       | 160 ++++++++++++++++++++++++++++++++++++++++++++++++++
 docs/contributing.rst |   1 +
 docs/index.rst        |   9 +++
 3 files changed, 170 insertions(+)
 create mode 100644 CONTRIBUTING.md
 create mode 100644 docs/contributing.rst


hooks/post-receive
-- 
python-stdnum