[Pkg-isocodes-devel] [pkg-isocodes-Bugs][313628] codes in ISO 3166-2 fail uniqueness guarantee
pkg-isocodes-bugs at alioth.debian.org
pkg-isocodes-bugs at alioth.debian.org
Sun Sep 17 13:40:12 UTC 2017
pkg-isocodes-Bugs item #313628 was changed at 2017-09-17 15:40 by Tobias Quathamer
You can respond by visiting:
https://alioth.debian.org/tracker/?func=detail&atid=413077&aid=313628&group_id=30316
>Status: Closed
Priority: 3
Submitted By: Ben Finney (bignose-guest)
>Assigned to: Tobias Quathamer (toddy)
Summary: codes in ISO 3166-2 fail uniqueness guarantee
Part: ISO 3166-2
Initial Comment:
I don't have access to the source documents for ISO standards, but at
<URL:http://www.iso.org/iso/country_codes/background_on_iso_3166/iso_3166-2.htm>
the information page for ISO 3166-2 says:
Note that the characters after the separator are only unique within
the subdivision list of one particular country. They can be (and
have been) reused in the list of subdivision names of other
countries e. g. ID-RI (Riau province of Indonesia) and NG-RI (Rivers
province in Nigeria). So only a complete code element i.e. with the
alpha-2 country code in front guarantees uniqueness.
The guarantee at the end – “a complete code element i.e. with the
alpha-2 country code in front guarantees uniqueness” – is violated by
‘/usr/share/xml/iso_3166_2.xml’ as of version 3.30.
A Python 3 session shows the discrepancy:
>>> from xml.etree import ElementTree
>>> iso_3166_2 = ElementTree.parse("/usr/share/xml/iso-codes/iso_3166_2.xml")
>>> entry_xpath = "./iso_3166_country/iso_3166_subset/iso_3166_2_entry"
>>> entry_codes = [node.get('code') for node in iso_3166_2.iterfind(entry_xpath)]
>>> # Count of codes from every entry:
>>> len([node.get('code') for node in iso_3166_2.iterfind(entry_xpath)])
4691
>>> # Count of *unique* codes from every entry:
>>> len(set(entry_codes))
4688
>>> import collections
>>> code_counts = collections.Counter(entry_codes)
>>> print({(code, count) for (code, count) in code_counts.items() if count > 1})
{('ID-MA', 2), ('CV-SL', 2), ('IE-C', 2)}
So the complete codes ‘ID-MA’, ‘CV-SL’, and ‘IE-C’ are all violating the
guarantee of uniqueness described on the ISO page.
If this is the case in upstream's data, please communicate with upstream
about the violated guarantee, and work with them to resolve it. The best
resolution would be to keep the guarantee described above (to assign a
unique complete code to every entry).
(This was reported as Debian bug#649581.)
----------------------------------------------------------------------
>Comment By: Tobias Quathamer (toddy)
Date: 2017-09-17 15:40
Message:
Hi,
this bug has been fixed since April 2012, starting with version 3.35.
----------------------------------------------------------------------
Comment By: Ben Finney (bignose-guest)
Date: 2012-05-13 12:10
Message:
This bug persisted until version 3.34. As of version 3.35 it has been fixed by upstream:
>>> from xml.etree import ElementTree
>>> iso_3166_2 = ElementTree.parse("/usr/share/xml/iso-codes/iso_3166_2.xml")
>>> entry_xpath = "./iso_3166_country/iso_3166_subset/iso_3166_2_entry"
>>> entry_codes = [node.get('code') for node in iso_3166_2.iterfind(entry_xpath)]
>>> # Count of codes from every entry:
... len([node.get('code') for node in iso_3166_2.iterfind(entry_xpath)])
4850
>>> # Count of *unique* codes from every entry:
... len(set(entry_codes))
4850
>>> import collections
>>> code_counts = collections.Counter(entry_codes)
>>> print({(code, count) for (code, count) in code_counts.items() if count > 1})
set([])
As reported in ISO 3166-2 Newsletter II-3 <URL:http://www.iso.org/iso/iso_3166-2_newsletter_ii-3_2011-12-13.pdf> the following code changes fixed the duplicates:
* CV-SL now uniquely identifies “Sal” the municipality; CV-SO is the new code for “São Lourenço dos Órgãos” the municipality.
* ID-MA now uniquely identifies “Maluku” the province; ID-ML is the new code for “Maluku” the geographical unit.
* IE-C now uniquely identifies “Connaught” the province; IE-CO is the new code for “Cork” the county.
It would be helpful to mention this bug and its resolution in a changelog entry (perhaps retro-actively in the version 3.35 changelog entry).
----------------------------------------------------------------------
You can respond by visiting:
https://alioth.debian.org/tracker/?func=detail&atid=413077&aid=313628&group_id=30316
More information about the Pkg-isocodes-devel
mailing list