Google is not using those sub-national divisions (region), that the EU or OECD is using for statistical purposes. This means that any comparison of Google’s Mobility Reports with population, transport, public health, or economic variable requires a sub-national division vocabulary for translation, or in the EU parlance a correspondence table.
Google appears to be using, at least in most of the cases, the ISO-3166-2
sub-national divisions from the ISO 3166 Codes for the representation of names of countries and their subdivisions with non-standard naming (labels). When we want to analyse the Google Mobility Report together with national statistics, this causes several problems:
ISO-3166-2
is not a hierarchical typology, while Europe’s statistical typologies, the NUTS1
-NUTS2
-NUTS3
are hierarchical, so we had to create many hundred lines of R
code to create the necessary descriptive metadata for joining the ISO
-like Google and the NUTS
typologies of Europe.
The ISO-3166-2
is changing very fast, within Europe there are many changes every year; however, these changes are not so easy to trace as the changes within the European statistical nomenclature (NUTS). We are unsure how Google handles changes in ISO-3166-2
over the coverage of the Mobility Reports.
Google is not using either the machine-readable, alphanumeric ISO-3166-2
codes or the official (Latin) labels, instead it uses a quasi-English unofficial labelling, which requires manual identification of Google’s typology items.
A bit exotic example, Réunion, or, as France calls it, La Réunion, shows some of the statistical impracticalities of the non-hierarchical ISO-3166
codes. Google used the RE ISO-3166-1
country code and the corresponding label Réunion to identify the small island in the Indian ocean. However, as a part of France (and the European Union), it is also described in France’s ISO-3166-2
as FR-LRE
, labelled La Réunion as an overseas region, and as FR-RE
, as an overseas department. The distinction mirrors France’s administrative laws, and matches the rows in Google’s reports with three potential ISO-3166
codes. If we want to join Google’s data with regional statistical data from French or EU official tables, we have to use code FRY4
from the Frech NUTS3
typology.
There are some very small sovereign states that do not have any NUTS
divisions. Luxembourg is not divided (LU
= LU0
= LU00
= LU000
). Our package can project the national data given by Google to any NUTS
level, so that these part of Europe fall into the right place on country, NUTS1
, NUTS2
and NUTS3
level data tables, too.
However, in most cases, the NUTS
typology is hierarchical. If we take the example of Malta, which is the smallest member state with the least number of possible divisions (exactly two: MT001
refers to the main island of Malta and MT002
refers to the smaller islands of Gozo and Comino.) We know that in the hierarchy MT002
belongs to MT00
, which belongs to MT0
, which belongs to the country Malta (MT
). Therefore, if we have a bit of ambiguity with a territory, we can still roughly place it, if we at least know at what level would it fit to Europe’s map. In Malta’s case, Google did not divide the country, so we know that Google’s data refers to MT
= MT0
= MT00
, which makes matching with any national (NUTS0
), NUTS1
or NUTS2
level data table possible, although we cannot directly match with NUTS3
tables which separate MT001
and MT002
. In this case, we have to use impute_down_nuts()
to impute (project) the MT
= MT0
= MT00
data to MT001
and MT002
.
Google provided extremely detailed data for some small countries like Estonia and Latvia, because the ISO-3166-2
subdivisons of these relatively small countries are very small, i.e. usually smaller than the NUTS3
statistical regions. These countries, due to their size, are not divided in NUTS1
and NUTS2
levels (EE
= EE0
= EE00
), but they have statistical subdivisions on NUTS3
level. The ISO-3166-2
used by Google tend to be on a lower level (quasi-NUTS4
). The NUTS
earlier contained a NUTS4
typology, but it was very impractical to use, because divisions at this level tend to change very quickly, and the creation of statistical aggregates is not always possible. For example, it would clearly not be possible disaggregating the GDP in a meaningful to such small territorial units.
Because most Eurostat data is available only on NUTS2
level, we can simply use the EE
and LV
data from Google and project it to the technical NUT2
regions EE00
and LV00
(both identical to the country itself.) If we would want to match Estonia’s and Latvias data with NUTS3
levels statistical tables, we would have to created weighted averages from Googles sub-NUTS3 regions for these countries.
In the case of small non-EU member states we applied the same logic, although these countries are at the moment not part of the official NUTS nomenclature. For example, we made Andorra AD
= AD0
= AD00
= AD00
. Eurostat’s regional data products usually do not contain data from Andorra, but the national data tables sometimes do, and this data can be safely projected down to the identical technical NUTS1
“region” of AD0
or the technical NUTS2
region of AD00
.
Some non-EU member states, such as Liechtenstein, Norway, Iceland (the European Economic Area), or (potential) EU member canidates on the Balkans, i.e. Albania, Montengro, North Macedonia, Serbia are becoming part of the EU NUTS2021
, which is already defined but not yet used, and currently they have NUTS
equivalent codes.
Cyprus is unfortunately not present in the Google Mobility Reports.
In many cases the ISO-3166-2
subdivisions used by Google correspond to some NUTS
typology elements. After figuring out the correct NUTS
typology for the Google rows, we can aggregate up NUTS3
level data or project down NUTS1
data to the NUTS2
level, which is the most likely level for practical statistical analysis. In some cases, the ISO-3166-2
correspond to earlier definitions of NUTS
. We could have chosen to try to match currently non-matched ISO-3166-2
with NUTS2010
or NUTS2003
definitions, and then try to use a time-wise correspondence among NUTS
definitions. If we did not find an equivalence with any elements of the NUTS2016
definitions, we probably could have found it in the historical NUTS2003
, NUTS2010
or other typologies, and could have tried to use our timewise-correspondence to find an equivalent. Even if there is a formula that connectes a NUTS2016
typological element to various NUTS2003
elements, and thus via ISO-3166-2
, it would require an almost case-by-case programming to exploit this connection, given that there are many possibilities in time-wise correspondents (see vignette: Working With Regional, Sub-National Statistical Products). Instead we used some simplifications when the ISO-3166-2
and the currently used NUTS1016
typology do not match.
In some cases, Google merged certain statistical regions of Europe. For example, following the ISO-3166-2- subdivisions of Italy the culturally autonomous, partly German speaking part of Italy was merged into a single unit (ISO-3166-2
: IT-32
, with Italian labelling Trentino-Alto Adige and with German labelling Trentino-Südtirol, but Google used an unofficial English labelling), even though these are two undivided NUTS2
regions, i.e. Trentino ITD2
=ITD20
and Alto Adige / Südtirol (South Tyrol for Google) ITD1
= ITD10
. In this case, comparison is possible, but requires addition or weighting between EU statistical units for joining with Google data. We gave the pseudo-NUTS
code ITDX
to Trentino-South Tyrol, which clearly identifies the region as part of ITD
for Northeastern Italy, and of course as part of IT
or Italy.
For simplicity, we treated some of these historical regions identical to a current one, if the difference was very small. For example, Bragança district in Portugal (in ISO-3166-2
: PT-04
) was coded as PT11E
, because it is almost identical to the NUTS3
region Terras de Trás-os-Montes.
In other cases, when Google’s typology cuts across current European statistical region lines, we again chose the creation of pseudo-NUTS
codes. For example, we created the irregular pseudo-NUTS
code PT11X
for the Braga district of Portugal (in ISO-3166-2
: PT-03
), because it is certainly part of PT11
Continente, PT11
Norte, and the technical NUTS0
PT
for Portugal, but it does not correspond to any NUTS3
units of Portugal in the NUTS2016
definition. This coding will not pass the validate_nuts_code()
function, but it certainly gives a strong starting point for data imputation.
We faced many such problems with Portugal and Wales within Great Britain in the United Kingdom.