python-stdnum branch master updated. 1.18-5-gb1dc313
[
Date Prev][Date Next]
[
Thread Prev][Thread Next]
python-stdnum branch master updated. 1.18-5-gb1dc313
- From: Commits of the python-stdnum project <python-stdnum-commits [at] lists.arthurdejong.org>
- To: python-stdnum-commits [at] lists.arthurdejong.org
- Reply-to: python-stdnum-users [at] lists.arthurdejong.org, python-stdnum-commits [at] lists.arthurdejong.org
- Subject: python-stdnum branch master updated. 1.18-5-gb1dc313
- Date: Fri, 30 Dec 2022 17:08:16 +0100 (CET)
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "python-stdnum".
The branch, master has been updated
via b1dc3137e8aa6cfd0435e7bf758588171f97cfa0 (commit)
from df894c37e9b28d639df5287eb98c6a01b47104d2 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
https://arthurdejong.org/git/python-stdnum/commit/?id=b1dc3137e8aa6cfd0435e7bf758588171f97cfa0
commit b1dc3137e8aa6cfd0435e7bf758588171f97cfa0
Author: Arthur de Jong <arthur@arthurdejong.org>
Date: Fri Dec 30 16:39:36 2022 +0100
Add initial CONTRIBUTING.md file
Initial description of the information needed for adding new number
formats and some coding and testing guidelines.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 0000000..f6ba7a3
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,160 @@
+Contributing to python-stdnum
+=============================
+
+This document describes general guidelines for contributing new formats or
+other enhancement to python-stdnum.
+
+
+Adding number formats
+---------------------
+
+Basically any number or code that has some validation mechanism available or
+some common formatting is eligible for inclusion into this library. If the
+only specification of the number is "it consists of 6 digits" implementing
+validation may not be that useful.
+
+Contributions of new formats or requests to implement validation for a format
+should include the following:
+
+* The format name and short description.
+* References to (official) sources that describe the format.
+* A one or two paragraph description containing more details of the number
+ (e.g. purpose and issuer and possibly format information that might be
+ useful to end users).
+* If available, a link to an (official) validation service for the number,
+ reference implementations or similar sources that allow validating the
+ correctness of the implementation.
+* A set of around 20 to 100 "real" valid numbers for testing (more is better
+ during development but only around 100 will be retained for regression
+ testing).
+* If the validation depends on some (online) list of formats, structures or
+ parts of the identifier (e.g. a list of region codes that are part of the
+ number) a way to easily update the registry information should be
+ available.
+
+
+Code contributions
+------------------
+
+Improvements to python-stdnum are most welcome. Integrating contributions
+will be done on a best-effort basis and can be made easier if the following
+are considered:
+
+* Ideally contributions are made as GitHub pull requests, but contributions
+ by email (privately or through the python-stdnum-users mailing list) can
+ also be considered.
+* Submitted contributions will often be reformatted and sometimes
+ restructured for consistency with other parts.
+* Contributions will be acknowledged in the release notes.
+* Contributions should add or update a copyright statement if you feel the
+ contribution is significant.
+* All contribution should be made with compatible applicable copyright.
+* It is not needed to modify the NEWS, README.md or files under docs for new
+ formats; these files will be updated on release.
+* Marking valid numbers as invalid should be avoided and are much worse than
+ marking invalid numbers as valid. Since the primary use case for
+ python-stdnum is to validate entered data having an implementation that
+ results in "computer says no" should be avoided.
+* Number format implementations should include links to sources of
+ information: generally useful links (e.g. more details about the number
+ itself) should be in the module docstring, if it relates more to the
+ implementation (e.g. pointer to reference implementation, online API
+ documentation or similar) a comment in the code is better
+* Country-specific numbers and codes go in a country or region package (e.g.
+ stdnum.eu.vat or stdnum.nl.bsn) while global numbers go in the toplevel
+ name space (e.g. stdnum.isbn).
+* All code should be well tested and achieve 100% code coverage.
+* Existing code structure conventions (e.g. see README for interface) should
+ be followed.
+* Git commit messages should follow the usual 7 rules.
+* Declarative or functional constructs are preferred over an iterative
+ approach, e.g.::
+
+ s = sum(int(c) for c in number)
+
+ over::
+
+ s = 0
+ for c in number:
+ s += int(c)
+
+
+Testing
+-------
+
+Tests can be run with `tox`. Some basic code style tests can be run with `tox
+-e flake8` and most other targets run the test suite with various supported
+Python interpreters.
+
+Module implementations have a couple of smaller test cases that also serve as
+basic documentation of the happy flow.
+
+More extensive tests are available, per module, in the tests directory. These
+tests (also doctests) cover more corner cases and should include a set of
+valid numbers that demonstrate that the module works correctly for real
+numbers.
+
+The normal tests should never require online sources for execution. All
+functions that deal with online lookups (e.g. the EU VIES service for VAT
+validation) should only be tested using conditional unittests.
+
+
+Finding test numbers
+--------------------
+
+Some company numbers are commonly published on a company's website contact
+page (e.g. VAT or other registration numbers, bank account numbers). Doing a
+web search limited to a country and some key words generally turn up a lot of
+pages with this information.
+
+Another approach is to search for spreadsheet-type documents with some
+keywords that match the number. This sometimes turns up lists of companies
+(also occasionally works for personal identifiers).
+
+For information that is displayed on ID cards or passports it is sometimes
+useful to do an image search.
+
+For dealing with numbers that point to individuals it is important to:
+
+* Only keep the data that is needed to test the implementation.
+* Ensure that no actual other data relation to a person or other personal
+ information is kept or can be inferred from the kept data.
+* The presence of a number in the test set should not provide any information
+ about the person (other than that there is a person with the number or
+ information that is present in the number itself).
+
+Sometimes numbers are part of a data leak. If this data is used to pick a few
+sample numbers from the selection should be random and the leak should not be
+identifiable from the picked numbers. For example, if the leaked numbers
+pertain only to people with a certain medical condition, membership of some
+organisation or other specific property the leaked data should not be used.
+
+
+Reverse engineering
+-------------------
+
+Sometimes a number format clearly has a check digit but the algorithm is not
+publicly documented. It is sometimes possible to reverse engineer the used
+check digit algorithm from a large set of numbers.
+
+For example, given numbers that, apart from the check digit, only differ in
+one digit will often expose the weights used. This works reasonably well if
+the algorithm uses modulo 11 is over a weighted sums over the digits.
+
+See
https://github.com/arthurdejong/python-stdnum/pull/203#issuecomment-623188812
+
+
+Registries
+----------
+
+Some numbers or parts of numbers use validation base on a registry of known
+good prefixes, ranges or formats. It is only useful to fully base validation
+on these registries if the update frequency to these registries is very low.
+
+If there is a registry that is used (a list of known values, ranges or
+otherwise) the downloaded information should be stored in a data file (see
+the stdnum.numdb module). Only the minimal amount of data should be kept (for
+validation or identification).
+
+The data files should be able to be created and updated using a script in the
+`update` directory.
diff --git a/docs/contributing.rst b/docs/contributing.rst
new file mode 100644
index 0000000..58977a8
--- /dev/null
+++ b/docs/contributing.rst
@@ -0,0 +1 @@
+.. include:: ../CONTRIBUTING.md
diff --git a/docs/index.rst b/docs/index.rst
index e48b450..c7e5c09 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -332,3 +332,12 @@ Changes in python-stdnum
:maxdepth: 2
changes
+
+
+Contributing to python-stdnum
+-----------------------------
+
+.. toctree::
+ :maxdepth: 2
+
+ contributing
-----------------------------------------------------------------------
Summary of changes:
CONTRIBUTING.md | 160 ++++++++++++++++++++++++++++++++++++++++++++++++++
docs/contributing.rst | 1 +
docs/index.rst | 9 +++
3 files changed, 170 insertions(+)
create mode 100644 CONTRIBUTING.md
create mode 100644 docs/contributing.rst
hooks/post-receive
--
python-stdnum
- python-stdnum branch master updated. 1.18-5-gb1dc313,
Commits of the python-stdnum project