lists.arthurdejong.org
RSS feed

python-stdnum branch master updated. 0.9-12-g123e9cb

[Date Prev][Date Next] [Thread Prev][Thread Next]

python-stdnum branch master updated. 0.9-12-g123e9cb



This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "python-stdnum".

The branch, master has been updated
       via  123e9cbce5ba219e183799dcc3ea8d08e64213f3 (commit)
      from  86f60a2acb592a6ac6260867f8c9d423fc25d9d8 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://arthurdejong.org/git/python-stdnum/commit/?id=123e9cbce5ba219e183799dcc3ea8d08e64213f3

commit 123e9cbce5ba219e183799dcc3ea8d08e64213f3
Author: Arthur de Jong <arthur@arthurdejong.org>
Date:   Sun Oct 5 22:46:56 2014 +0200

    Update URLs for Malaysian code lists
    
    This updates the URLs for the state and country codes as published by
    the National Registration Department of Malaysia and changes the parsing
    to the new page layout.
    
    This also updates the data file.
    
    https://github.com/arthurdejong/python-stdnum/issues/14

diff --git a/getmybp.py b/getmybp.py
index 3f84924..53da015 100755
--- a/getmybp.py
+++ b/getmybp.py
@@ -27,8 +27,8 @@ import BeautifulSoup
 
 
 # URLs that are downloaded
-state_list_url = 'http://www.jpn.gov.my/en/informasi/states-code'
-country_list_url = 'http://www.jpn.gov.my/en/informasi/country-code'
+state_list_url = 'http://www.jpn.gov.my/informasi/kod-negeri/'
+country_list_url = 'http://www.jpn.gov.my/en/informasi/kod-negara/'
 
 
 spaces_re = re.compile('\s+', re.UNICODE)
@@ -43,7 +43,7 @@ def parse(f):
     """Parse the specified file."""
     soup = BeautifulSoup.BeautifulSoup(f, convertEntities='html')
     # find all table rows
-    for tr in soup.find('div', id='content').findAll('tr'):
+    for tr in soup.find('div', id='inner-main').findAll('tr'):
         # find the rows with four columns of text
         tds = [
             clean(''.join(x.string for x in td.findAll(text=True)))
@@ -56,19 +56,19 @@ def parse(f):
 
 
 if __name__ == '__main__':
-    results = defaultdict(lambda : defaultdict(list))
+    results = defaultdict(lambda : defaultdict(set))
     # read the states
     #f = open('/tmp/states.html', 'r')
     f = urllib.urlopen(state_list_url)
     for state, bps in parse(f):
         for bp in bps.split(','):
             results[bp.strip()]['state'] = state
-            results[bp.strip()]['countries'].append('Malaysia')
+            results[bp.strip()]['countries'].add('Malaysia')
     # read the countries
     #f = open('/tmp/countries.html', 'r')
     f = urllib.urlopen(country_list_url)
     for country, bp in parse(f):
-        results[bp]['countries'].append(country)
+        results[bp]['countries'].add(country)
     # print the results
     print '# generated from National Registration Department of Malaysia, 
downloaded from'
     print '# %s' % state_list_url
@@ -79,7 +79,8 @@ if __name__ == '__main__':
         row = results[bp]
         if 'state' in row:
             res += ' state="%s"' % row['state']
-        countries = row['countries']
+        countries = list(row['countries'])
+        countries.sort()
         if len(countries) == 1:
             res += ' country="%s"' % countries[0]
         if len(countries) > 0:
diff --git a/stdnum/my/bp.dat b/stdnum/my/bp.dat
index 40231fd..6492947 100644
--- a/stdnum/my/bp.dat
+++ b/stdnum/my/bp.dat
@@ -1,6 +1,6 @@
 # generated from National Registration Department of Malaysia, downloaded from
-# http://www.jpn.gov.my/en/informasi/states-code
-# http://www.jpn.gov.my/en/informasi/country-code
+# http://www.jpn.gov.my/informasi/kod-negeri/
+# http://www.jpn.gov.my/en/informasi/kod-negara/
 
 01 state="Johor" country="Malaysia" countries="Malaysia"
 02 state="Kedah" country="Malaysia" countries="Malaysia"
@@ -57,30 +57,33 @@
 57 state="Wilayah Persekutuan (Kuala Lumpur)" country="Malaysia" 
countries="Malaysia"
 58 state="Wilayah Persekutuan (Labuan)" country="Malaysia" countries="Malaysia"
 59 state="Negeri Sembilan" country="Malaysia" countries="Malaysia"
-60 country="Brunei" countries="Brunei"
-61 country="Indonesia" countries="Indonesia"
-62 countries="Cambodia, Kampuchea"
-63 country="Laos" countries="Laos"
-64 country="Mynmar" countries="Mynmar"
-65 country="Filipina" countries="Filipina"
-66 country="Singapura" countries="Singapura"
-67 country="Thailand" countries="Thailand"
-68 country="Vietnam" countries="Vietnam"
-74 country="China" countries="China"
-75 country="India" countries="India"
-76 country="Pakistan" countries="Pakistan"
-77 country="Arab Saudi" countries="Arab Saudi"
-78 country="Sri Lanka" countries="Sri Lanka"
-79 country="Bangladesh" countries="Bangladesh"
+60 country="BRUNEI" countries="BRUNEI"
+61 country="INDONESIA" countries="INDONESIA"
+62 countries="CAMBODIA, DEMOCRATIC KAMPUCHE, KAMPUCHEA"
+63 country="LAOS" countries="LAOS"
+64 countries="BURMA, MYANMAR"
+65 country="PHILIPPINES" countries="PHILIPPINES"
+66 country="SINGAPURA" countries="SINGAPURA"
+67 country="THAILAND" countries="THAILAND"
+68 country="VIETNAM" countries="VIETNAM"
+71,72 country="LUAR NEGARA" countries="LUAR NEGARA"
+74 country="CHINA" countries="CHINA"
+75 country="INDIA" countries="INDIA"
+76 country="PAKISTAN" countries="PAKISTAN"
+77 countries="ARAB SAUDI, SAUDI ARABIA"
+78 country="SRI LANKA" countries="SRI LANKA"
+79 country="BANGLADESH" countries="BANGLADESH"
 82 state="Negeri Tidak Diketahui" country="Malaysia" countries="Malaysia"
-83 countries="Australia, American Samoa, Macedonia, New Zealand, New 
Caledonia, Papua New Gurney, Fiji, Timor Leste"
-84 countries="Argentina, Anguilla, Aruba, Bolivia, Brazil, Paraguay, Peru, 
Chile, Colombia, Equador, Uruguay, Venezuela"
-85 countries="Algeria, Angola, Kenya, Afrika Tengah, Liberia, Afrika Selatan, 
Mali, Mauritania, Morocco, Malawi, Botswana, Mozambique, Burundi, Nigeria, 
Namibia, Cameroon, Chad, Rwanda, Senegal, Sierra Leone, Somalia, Djibouti, 
Sudan, Egypt, Ethopia, Swaziland, Eritrea, Gambia, Ghana, Tunisia, Tanzania, 
Tonga, Togo, Uganda, Zaire, Zambia, Zimbabwe"
-86 countries="Austria, Luxembourg, Armenia, Malta, Monaco, Belgium, 
Nitherlands, Norway, Cyprus, Portugal, Denmark, Sweeden, Spain, Switzerland, 
France, Finland, Slovakia, Slovenia, Greece, Germany, Holy See (Vatican City), 
Italy"
-87 countries="Britain, Ireland"
-88 countries="Jordan, Kuwait, Lebanon, Bahrain, Oman, Qatar, Syria, Turkey, 
United Arab Emirate, Iran, Iraq, Israel, Yemen"
-89 countries="Japan, Korea Selatan, Korea Utara, Taiwan"
-90 countries="Jamaica, Bahamas, Barbados, Belize, Mexico, Nicaragua, Panama, 
Puerto Rico, Costa Rica, Cuba, Dominica, El Salvador, Grenada, Guatemala, 
Trinidad&Tobado, Haiti, Honduras"
-91 countries="Canada, Greenland, United State"
-92 countries="Albania, Albania, Latvia, Lithuania, Bulgaria, Byelorussia, 
Bosnia, Belarus, Poland, Romania, Russia, Czechoslovakia, Crotia, Esthonia, 
Serbia, Georgia, Hungary, Ukraine"
-93 countries="Afghanistan, Antigua & Barbuda, Kazakhstan, Andorra/Andora, 
Libya, Arzebaijan, Antartica, Maldives, Madagascar, Mauritius, Mongolia, Benin, 
Maghribi, Bhutan, Macau, Nepal, Bermuda, Burkina faso/Burkina, Bora-bora, 
Bouvet Island, Palestine, Cape Verde, Comoros, Seychelles, Soloman Islands, 
Samoa, San Marino, Guinea, Gibraltar, Tajikistan, Tukmenistan, Hong Kong, 
Uzbekistan, Ivory Coast, Vanuatu, Iceland, Yugoslavia"
+83 countries="AMERICAN SAMOA, ASIA PASIFIK, AUSTRALIA, CHRISTMAS ISLAND, 
COCOS(KEELING) ISLANDS, COOK ISLANDS, FIJI, FRENCH POLYNESIA, GUAM, HEARD AND 
MCDONALD ISLANDS, MARSHALL ISLANDS, MICRONESIA, NEW CALEDONIA, NEW ZEALAND, 
NIUE, NORFOLK ISLANDS, PAPUA NEW GUINEA, TIMOR LESTE, TOKELAU, UNITED STATES 
MINOR OUTLYING ISLANDS, WALLIS AND FUTUNA ISLANDS"
+84 countries="AMERIKA SELATAN, ANGUILLA, ARGENTINA, ARUBA, BOLIVIA, BRAZIL, 
CHILE, COLOMBIA, ECUADOR, FRENCH GUIANA, GUADELOUPE, GUYANA, PARAGUAY, PERU, 
SOUTH GEORGIA & THE SOUTH SANDWICH ISLANDS, SURINAME, URUGUAY, VENEZUELA"
+85 countries="AFRIKA, AFRIKA SELATAN, ALGERIA, ANGOLA, BOTSWANA, BURUNDI, 
CAMEROON, CENTRAL AFRICAN REPUBLIC, CHAD, CONGO-BRAZZAVILLE, CONGO-KINSHASA, 
DJIBOUTI, EGYPT, ERITREA, ETHIOPIA, GABON, GAMBIA, GHANA, GUINEA, KENYA, 
LIBERIA, MALAWI, MALI, MAURITANIA, MAYOTTE, MOROCCO, MOZAMBIQUE, NAMIBIA, 
NIGER, NIGERIA, REUNION, RWANDA, SENEGAL, SIERRA LEONE, SOMALIA, SUDAN, 
SWAZILAND, TANZANIA, TOGO, TONGA, TUNISIA, UGANDA, WESTERN SAHARA, ZAIRE, 
ZAMBIA, ZIMBABWE"
+86 countries="ARMENIA, AUSTRIA, BELGIUM, CYPRUS, DENMARK, EUROPAH, FAEROE 
ISLANDS, FINLAND, FINLAND, METROPOLITAN, FRANCE, GERMANY, GERMANY (DEM.REP), 
GERMANY (FED.REP), GREECE, HOLY SEE (VATICAN CITY), ITALY, LUXEMBOURG, 
MACEDONIA, MALTA, MEDITERANEAN, MONACO, NETHERLANDS, NORWAY, PORTUGAL, REP. OF 
MOLDOVA, SLOVAKIA, SLOVENIA, SPAIN, SWEDEN, SWITZERLAND, UK-DEPENDENT 
TERRITORIES CIT, UK-NATIONAL OVERSEAS, UK-OVERSEAS CITIZEN, UK-PROTECTED 
PERSON, UK-SUBJECT"
+87 countries="BRITAIN, GREAT BRITAIN, IRELAND"
+88 countries="BAHRAIN, IRAN, IRAQ, ISRAEL, JORDAN, KUWAIT, LEBANON, OMAN, 
QATAR, REPUBLIC OF YEMEN, SYRIA, TIMUR TENGAH, TURKEY, UNITED ARAB EMIRATE, 
YEMEN ARAB REP., YEMEN PEOPLE DEM.RE"
+89 countries="JAPAN, KOREA (SELATAN), KOREA (UTARA), TAIWAN, TIMUR JAUH"
+90 countries="BAHAMAS, BARBADOS, BELIZE, CARIBBEAN, COSTA RICA, CUBA, 
DOMINICA, DOMINICAN REPUBLIC, EL SALVADOR, GRENADA, GUATEMALA, HAITI, HONDURAS, 
JAMAICA, MARTINIQUE, MEXICO, NICARAGUA, PANAMA, PUERTO RICO, ST. KITTS AND 
NEVIS, ST. LUCIA, ST. VINCENT AND THE GRENADINES, TRINIDAD & TOBAGO, TURKS AND 
CAICOS ISLANDS, VIRGIN ISLANDS (USA)"
+91 countries="AMERIKA UTARA, CANADA, GREENLAND, NETHERLANDS ANTILLES, ST. 
PIERRE AND MIQUELON, UNITED STATES OF AMERICA"
+92 countries="ALBANIA, BELARUS, BOSNIA HERZEGOVINA, BULGARIA, BYELORUSSIA, 
CROATIA, CZECH REPUBLIC, CZECHOSLOVAKIA, ESTONIA, GEORGIA, HUNGARY, LATVIA, 
LITHUANIA, MONTENEGRO, POLAND, REPUBLIC OF KOSOVO, ROMANIA, RUSSIAN FEDERATION, 
SERBIA, U.S.S.R, UKRAINE"
+93 countries="AFGHANISTAN, ANDORRA/ANDORA, ANTARTICA, ANTIGUA & BARBUDA, 
AZERBAIJAN, BENIN, BERMUDA, BHUTAN, BORA-BORA, BOUVET ISLAND, BRITISH INDIAN 
OCEAN TERRITORY, BURKINA FASO/BURKINA, CAPE VERDE, CAYMAN ISLANDS, COMOROS, 
DAHOMEY, EQUATORIAL GUINEA, FALKLAND ISLANDS, FRENCH SOUTHERN TERRITORIES, 
GIBRALTAR, GUINEA-BISSAU, HONG KONG, ICELAND, IVORY COAST, KAZAKHSTAN, 
KIRIBATI, KYRGYZSTAN, LESOTHO, LIBYA, LIECHTENSTEIN, MACAU, MADAGASCAR, 
MAGHRIBI, MALAGASY, MALDIVES, MAURITIUS, MONGOLIA, MONTSERRAT, NAURU, NEPAL, 
NORTHERN MARIANAS ISLANDS, OUTER MONGOLIA, PALAU, PALESTINE, PITCAIRN ISLANDS, 
SAMAO BARAT, SAMOA, SAN MARINO, SAO TOME & PRINCIPE, SEYCHELLES, SOLOMON 
ISLANDS, ST. HELENA, ST.LUCIA, ST.VICENT, SVALBARD AND JAN MAYEN ISLANDS, 
SWAPO, TAJIKISTAN, TURKMENISTAN, TUVALU, UPPER VOLTA, UZBEKISTAN, VANUATU, 
VATICAN CITY, VIRGIN ISLANDS (BRITISH), WESTERN SAMOA, YUGOSLAVIA"
+98 countries="STATELESS PERSON ARTICLE 1/1954, Tanpa Negara"
+99 countries="MAKLUMAT TIADA, MEKAH, NEUTRAL ZONE, REFUGEE, REFUGEE ARTICLE 
1/1951, UN SPECIALIZED AGENCY, UNITED NATIONS ORGANIZATION, UNSPECIFIED 
NATIONALITY"

-----------------------------------------------------------------------

Summary of changes:
 getmybp.py       |   15 +++++++-------
 stdnum/my/bp.dat |   59 ++++++++++++++++++++++++++++--------------------------
 2 files changed, 39 insertions(+), 35 deletions(-)


hooks/post-receive
-- 
python-stdnum
-- 
To unsubscribe send an email to
python-stdnum-commits-unsubscribe@lists.arthurdejong.org or see
http://lists.arthurdejong.org/python-stdnum-commits/