Re: Query Regarding Source of Validation Rules
[
Date Prev][
Date Next]
[
Thread Prev][
Thread Next]
Re: Query Regarding Source of Validation Rules
- From: Arthur de Jong <arthur [at] arthurdejong.org>
- To: Mohammad Mukadam <mohammad.mukadam [at] gmail.com>, python-stdnum-users [at] lists.arthurdejong.org
- Subject: Re: Query Regarding Source of Validation Rules
- Date: Sun, 25 Feb 2024 17:09:37 +0100
Hi Mohammad,
Sorry for not responding sooner,
On Mon, 2024-02-05 at 16:28 +0400, Mohammad Mukadam wrote:
> Firstly, I would like to express my appreciation to you for
> developing this library.
Thanks. It is really interesting to do the research on these numbers
and find an implementation.
> I am working on a project that requires validation of TINs and I
> wanted to leverage the different modules within the stdnum library.
> However, I wanted to understand how have you come up with the
> validation rules around the length, structure, checksum etc. for the
> different modules. Are there official government sources that provide
> this information or it is based on research from around the internet?
It really depends on the particular numbers. In most cases the
structure and length are publicly known but this is not always the case
for the check digit calculation.
For some numbers a public specification and often reference
implementation is available. Those are the easy ones (they are also
generally the best thought-out numbers). Links to specifications or
related resources can for some modules be found in the docstring or as
code comments.
Some organisations (countries) are much more secretive around the
format of numbers and seem to believe that it is risky to have the
format publicly available. However, if you set up a system where merely
knowing a particular number is enough to break your security model, the
security model is probably not that good to begin with. Some of the
implementations were indirectly based on unofficial documents or
documents that were leaked. For example I think one of the
implementations was originally based on a specification that was leaked
via a freedom of information response of which the intention was to not
respond with any useful information.
In some cases it is possible to derive the algorithm from another
implementation or when a significant number of actually valid numbers
are available. Since most of these algorithms are based on a mod 11 of
a weighted sum of the digits it is often just a case of finding the
correct weights.
Another thing to consider though is that these definitions are not
constant. Specifications, formats and check digit calculations change
so there is always a risk that doing this validation marks valid
numbers as invalid.
Hope this helps,
--
-- arthur - arthur@arthurdejong.org - https://arthurdejong.org/ --