[Pkg-isocodes-devel] [pkg-isocodes-Bugs][313628] codes in ISO 3166-2 fail uniqueness guarantee

pkg-isocodes-bugs at alioth.debian.org pkg-isocodes-bugs at alioth.debian.org
Sun May 13 10:10:34 UTC 2012


Bugs item #313628, was changed at 2012-05-13 19:56 by Ben Finney
You can respond by visiting: 
https://alioth.debian.org/tracker/?func=detail&atid=413077&aid=313628&group_id=30316

Status: Open
Priority: 3
Submitted By: Ben Finney  (bignose-guest)
Assigned to: Nobody (None)
Summary: codes in ISO 3166-2 fail uniqueness guarantee 
Part: ISO 3166-2


Initial Comment:
I don't have access to the source documents for ISO standards, but at
<URL:http://www.iso.org/iso/country_codes/background_on_iso_3166/iso_3166-2.htm>
the information page for ISO 3166-2 says:

    Note that the characters after the separator are only unique within
    the subdivision list of one particular country. They can be (and
    have been) reused in the list of subdivision names of other
    countries e. g. ID-RI (Riau province of Indonesia) and NG-RI (Rivers
    province in Nigeria). So only a complete code element i.e. with the
    alpha-2 country code in front guarantees uniqueness.

The guarantee at the end – “a complete code element i.e. with the
alpha-2 country code in front guarantees uniqueness” – is violated by
‘/usr/share/xml/iso_3166_2.xml’ as of version 3.30.

A Python 3 session shows the discrepancy:

    >>> from xml.etree import ElementTree
    >>> iso_3166_2 = ElementTree.parse("/usr/share/xml/iso-codes/iso_3166_2.xml")
    >>> entry_xpath = "./iso_3166_country/iso_3166_subset/iso_3166_2_entry"
    >>> entry_codes = [node.get('code') for node in iso_3166_2.iterfind(entry_xpath)]

    >>> # Count of codes from every entry:
    >>> len([node.get('code') for node in iso_3166_2.iterfind(entry_xpath)])
    4691
    >>> # Count of *unique* codes from every entry:
    >>> len(set(entry_codes))
    4688

    >>> import collections
    >>> code_counts = collections.Counter(entry_codes)
    >>> print({(code, count) for (code, count) in code_counts.items() if count > 1})
    {('ID-MA', 2), ('CV-SL', 2), ('IE-C', 2)}

So the complete codes ‘ID-MA’, ‘CV-SL’, and ‘IE-C’ are all violating the
guarantee of uniqueness described on the ISO page.

If this is the case in upstream's data, please communicate with upstream
about the violated guarantee, and work with them to resolve it. The best
resolution would be to keep the guarantee described above (to assign a
unique complete code to every entry).


(This was reported as Debian bug#649581.)


----------------------------------------------------------------------

Comment By: Ben Finney  (bignose-guest)
Date: 2012-05-13 20:10

Message:
This bug persisted until version 3.34. As of version 3.35 it has been fixed by upstream:

>>> from xml.etree import ElementTree
>>> iso_3166_2 = ElementTree.parse("/usr/share/xml/iso-codes/iso_3166_2.xml")
>>> entry_xpath = "./iso_3166_country/iso_3166_subset/iso_3166_2_entry"
>>> entry_codes = [node.get('code') for node in iso_3166_2.iterfind(entry_xpath)]
>>> # Count of codes from every entry:
... len([node.get('code') for node in iso_3166_2.iterfind(entry_xpath)])
4850
>>> # Count of *unique* codes from every entry:
... len(set(entry_codes))
4850
>>> import collections
>>> code_counts = collections.Counter(entry_codes)
>>> print({(code, count) for (code, count) in code_counts.items() if count > 1})
set([])

As reported in ISO 3166-2 Newsletter II-3 <URL:http://www.iso.org/iso/iso_3166-2_newsletter_ii-3_2011-12-13.pdf> the following code changes fixed the duplicates:

 * CV-SL now uniquely identifies “Sal” the municipality; CV-SO is the new code for “São Lourenço dos Órgãos” the municipality.

 * ID-MA now uniquely identifies “Maluku” the province; ID-ML is the new code for “Maluku” the geographical unit.

 * IE-C now uniquely identifies “Connaught” the province; IE-CO is the new code for “Cork” the county.


It would be helpful to mention this bug and its resolution in a changelog entry (perhaps retro-actively in the version 3.35 changelog entry).


----------------------------------------------------------------------

You can respond by visiting: 
https://alioth.debian.org/tracker/?func=detail&atid=413077&aid=313628&group_id=30316



More information about the Pkg-isocodes-devel mailing list