Over this time, it developed a tool based on a database of original ICT activity indicators,
29 4. 1 Normalization and rescaling of data...29 4. 2 European ICT Poles of Excellence Composite Indicator (EIPE CI...
30 4. 3 Sensitivity analysis...30 5 Data Sources...32 5. 1 QS WORLD UNIVERSITY RANKINGS by QS...32 5. 2 FP7 database by EC DG Connect...
33 5. 3 Bibliometrics: Web of Science by Thomson Reuters...33 5. 4 ICT R&d centre location:
Design Activity Tool by IHS isuppli...34 5. 5 European Investment Monitor by Ernst & young...34 5. 6 Patent data:
REGPAT by OECD...35 5. 7 Company-level information: ORBIS by Bureau Van dijk...36 5. 8 Venture capital:
39 6. 2 Patent data and patent-based internationalisation measures...42 References...47 5 1 Introduction ICT-related innovation is considered at the core of economic recovery, growth and productivity.
An additional challenge of the EIPE project was that this identification process had to be based only on the analysis of quantitative data,
The present report documents the methodologies and data sources used for this purpose. 1 Available at:
by determining the best available data sources, indicators and measurements that will help us to identify
Existing data sources also allow us to derive science and technology indicators such as the technology balance, bibliometrics or the technology intensity of the products or industries concerned (OECD, 2002.
Subject to data sources availability and their compliance with the EIPE project needs (see Section 2. 3),
In particular, the EIPE project builds up a measurement of ICT R&d activity by observing the actual presence of ICT technology producers (universities, companies, R&d facilities), their R&d expenditures and bibliometric data.
Other ways of measuring innovation that make use of the existing data sources rely on the fact that high economic dynamics are a key source and channel of technological and non-technological innovation.
Subject to data sources availability and their compliance with the EIPE project needs (see Section 2. 3),
More disaggregated data on business activity show business indicators such as the number of firms, employment, capital, turnover, value added, profits,
The second is data on small businesses and their owners produced by a wide range of nonofficial organizations that aim to capture the dynamism in the economy.
Subject to data sources availability and their compliance with the EIPE project needs (see Section 2. 3),
the EIPE project builds this measurement by observing the actual presence and development of ICT firms (headquarters and affiliates, employment data, turnover and investments.
and therefore is the most adequate for the EIPE project (see also 2. 2). Subject to data sources availability
the EIPE project builds up this measurement by observing the level of agglomeration of technology producers (universities, companies, R&d facilities), R&d expenditures and bibliometric data.
Subject to data sources availability and their compliance with the EIPE project needs (see Section 2. 3),
and their role, e g. broker-gatekeeper. 8 Subject to data sources availability and their compliance with the EIPE project needs (see Section 2. 3) the EIPE project takes into account the above mentioned approaches to identify measure
the selection criteria varying between data sources. Choosing the spatial unit of observation One of the central problems in the quantitative analysis of the geography of economic activity is the lack of data at regional level with a satisfactory level of granularity (Koschatzky & Lo
2007). ) A region is defined as a tract of land with more or less definitely marked boundaries, which often serves as an administrative unit below the level of the nation state.
For practical reasons connected with data availability and regional policy implementation, the NUTS classification is based
NUTS 3 150 000 800 000 1303 The standard level of regional data availability provided by, for example,
For some statistics and some countries only NUTS level 1 data are available. For the purposes of this study, the NUTS 3 level was chosen as the unit of analysis,
and compare data in a harmonised and standardized way across the entire European union. This unit of analysis gives us the (theoretical) opportunity to observe over 1300 spatially standardised areas across the EU,
However, because different data providers use different data formats in reporting the names of organisations, the categorisation of data, the location and geographic information (e g. city, ZIP CODE,
which aims to map ICT-related activities and the lack therefore of a number of data at the general level,
Selecting and processing data sources The choice of the spatial unit of observation, i e. NUTS 3,
and the policy-driven focus on ICT creates a double constraint on data. It implies that in most cases,
there is no official data available to 15 illustrate the activities and characteristics as defined for the purpose of the EIPE project.
Hence, a number of data selected for the EIPE project come from nonofficial data sources, e g. private databases.
a range of the most reliable and recognized data providers were tested carefully and selected, such as Thomson Reuters for bibliometrics, Bureau Van dijk for company-level information, Dow jones for venture capital data, etc.
The eight primary data sources used in EIPE are the following: FP7 data on FP participation from EC DG Connect, REGPAT by OECD, QS WORLD UNIVERSITY RANKINGS by QS, Web of Science by Thomson Reuters, Design Activity Tool by IHS isuppli, European
Investment Monitor by Ernst & young, ORBIS by Bureau Van dijk, and Venturesource by Dow jones. 9 More details about these data sources can be found in Chapter 5. Selecting indicators A list of indicators for the EIPE project was selected carefully on the basis of the above-described framework of activities and their characteristics and the discussion on their empirical measurements.
In this selection process the following additional criteria were applied: Validity: an indicator must be able to capture a relevant dimension of the issues at stake.
or surveys. 9 Some secondary data sources were used such as the (ICT) industrial scoreboard (JRC-IPTS).
They are not listed here as they were used as secondary tools to support the processing and extraction of data from the primary ones. 16 3 EIPE indicators Table 2 offers a first schematic presentation of the organisation of the nine
However, due to the limited availability of data and potential indicators meeting the requirements of this study,
and described in Table 4. They are presented together with a first indication of the data sources used and their time coverage.
ORBIS by Bureau Van dijk (see Section 5. 7) FP7 database by EC DG Connect (see Section 5. 2) Reference year (s) considered 2011 2005-2011 2007-2011 10
. 4) Computer science as defined by Web of Science classification of Research Areas Unit of observation NUTS 3 Source FP7 database by EC DG Connect (see Section 5. 2) ICT
Web of Science by Thomson Reuters (see Section 5. 3) Reference year (s) considered 2007-2011 2012 2000-2012 Data on the agglomeration of ICT R&d is extracted from information
For a detailed description of the data source, see Section 5. 1. Information about the funding
For a detailed description of the data source, see Section 5. 2. The location and ownership of over 2, 800 ICT R&d centres belonging to more than 170 multinational ICT companies across the world
For a detailed description of the data source, see Section 5. 4. The scientific output
of the research institutions in Europe for the period 2000-2012 from the Web of Science by Thomson Reuters. For a detailed description of the data source, see Section 5. 3. 20 Company-level
information on R&d expenditures in the ICT sector for the period 2005-2011 in Europe stemming from the ORBIS database by Bureau Van dijk. For a detailed description of the data source,
and described in Table 5. They are presented together with a first indication of the data sources used and their time coverage.
Another way of addressing the issue of ICT R&d internationalisation would be to look at the FP7 data.
However, due to its focus, this type of data would not allow us to take into account the global dimension of ICT R&d activity.
Thus, the information contained in FP data is used to construct other indicators e g. R&d agglomeration or ICT R&d networking.
Design Activity Tool by IHS isuppli (see Section 5. 4) Reference year (s) considered 2012 Data on the internationalisation of ICT R&d is extracted from information available about the location
For a detailed description of the data source, see Section 5. 4. 3. 1. 3 Networking in ICT R&d (Netrd) Networking measures addressing the ICT R&d activity rely on the network analysis of the locations of FP7
and described in Table 6. They are presented together with a first indication of the data sources used and their time coverage. 21 Table 6:
Source FP7 database by EC DG Connect (see Section 5. 2) Reference year (s) considered 2007-2011 Data on the networking of ICT R&d is extracted from information available about the funding
For a detailed description of the data source, see Section 5. 2. 3. 2 ICT innovation activities indicators 3. 2. 1 Agglomeration of ICT innovation (Agin
and described in Table 7. It offers a first indication of the data sources used and their time coverage.
To the extent allowed by the availability of indicators and data, the proposed indicators capture the input (investment in intangibles,
With venture capital data we aim to capture indirectly the dynamics of emerging new innovative companies:
at the time of publication of this report, there was no serious European-wide collection of data on these dynamics.
Similarly, patent counting and analysis has become one of the main acknowledged sources of information on innovation output across the world, particularly since the creation and divulgation of the EPO's PATSTAT database.
Venturesource by Dow jones (Section 5. 8) Patent data: REGPAT by OECD (Section 5. 6) Reference year (s) considered 2005-2012 2000-2012 2000-2009 Data on the agglomeration of ICT innovation is extracted from information available about:
Company-level information on investments in intangibles by over 1, 200 ICT firms located Europe wide in the period between 2005 and 2012 provided by ORBIS by Bureau Van dijk. For a detailed description of the data source,
see Section 5. 7. Over 26,000 venture capital deals executed in Europe in the ICT sector between 2000 and 2012,
data collected by Dow jones. For a detailed description of the data source, see Section 5. 8. Patenting activities of for over 5, 000 regions in the period between 2000 and 2009.
For a detailed description of the data source REGPAT by OECD see Section 5. 6. 3. 2. 2 Internationalisation of ICT innovation (Intin) The indicator characterising the internationalisation of ICT innovation activities is described in Table 8
. This table offers a first indication of the data sources used and their time coverage.
Unit of observation NUTS 3 Source Patent data: REGPAT by OECD (Section 5. 6) Reference year (s) considered 2000-2009 Data on the internationalization of ICT Innovation is extracted from the information available about patenting activities for over 5, 000 regions
for the period between 2000 and 2009. For a detailed description of the data source, REGPAT by OECD, see Section 5. 6. 3. 2. 3 Networking in ICT innovation (Netin) The 4 indicators characterising the networking of ICT Innovation
activity are listed and described in Table 9. They are presented together with a first indication of the data sources used and their time coverage.
Networking measures addressing ICT R&d activity rely on network analysis of the locations of coinventors who are based in different locations
) Unit of observation NUTS 3 for EU and TL3 for the remaining OECD countries Source Patent data:
REGPAT by OECD (Section 5. 6). Reference year (s) considered 2000-2009 Data on the ICT Innovation networking is extracted from the information available about global patenting activities for over 5, 000 regions
For a detailed description of the data source, REGPAT by OECD see Section 5. 6. 3. 3 ICT business activities indicators 3. 3. 1 Agglomeration of business activities (Agbuss
It offers a first indication of the data sources used and their time coverage. 13 To the extent allowed by the availability of indicators and data,
a mix of measures capturing business activities is proposed that, in addition, acknowledges the importance given to the business activities deployed by ICT multinationals
2005-2011 2000-2011 26 Data on the agglomeration of ICT business activities is extracted from information available about:
Company level information on investments in intangibles by over 1, 200 ICT firms located Europe wide for the period between 2005 and 2012 provided by ORBIS by Bureau Van dijk. For a detailed description of the data source,
data collected by Ernst& Young. For a detailed description of the data source, see Section 5. 5. 3. 3. 2 Internationalisation of ICT business activities (Intbuss) The 2 indicators characterising the internationalisation of ICT business activities are listed
and described in Table 11, which offers a first indication of the data sources used and their time coverage.
The measurement of the internationalization of business activity is proxied in EIPE by the information on the location of business affiliates owned by companies belonging to the (ICT) Industrial Scoreboard and the location of their respective Headquarters.
ORBIS by Bureau Van dijk (see Section 5. 7) Reference year (s) considered 2008 Data on the internationalisation of ICT business activity is extracted from company level information provided by ORBIS by Bureau
Van dijk. For a detailed description of the data source, see Section 5. 7. 3. 3. 3 Networking in ICT business activities (Netbuss) The 4 indicators characterising the networking of ICT business activity are listed
They are presented together with a first indication of the data sources used and their time coverage.
ORBIS by Bureau Van dijk (see Section 5. 7) Reference year (s) considered 2008 Data on the networking of ICT business activity is extracted from company-level information provided by ORBIS by Bureau Van dijk
. For a detailed description of the data source, see Section 5. 7. 14 We focus our attention on bilateral relationships between regions
and rescaling of data Most indicators are incommensurate with others, and have different measurement units.
Normalization process In order to normalise the data used in this study, a standardization method, i e. z-scores, is used.
in order to present EIPE CI on a scale from 0 to 100, the values are standardized with the Minimax procedure. 4. 3 Sensitivity analysis An important issue related to the construction of composite indicators is weighting.
a sensitivity analysis is applied. Sensitivity analysis is the study of how the uncertainty in the output of a model can be apportioned to different sources of uncertainty in the model input (Saltelli, Tarantola,
& Campolongo, 2000). 31 The weightage allocated to each sub-indicator is varied by between the three sub-indices in the following way:
and its results showed not to affect the final ranking in any significant way. 32 5 Data Sources The following eight databases have been the primary data sources used to elaborate the indicators and measurements of EIPE:
1. QS WORLD UNIVERSITY RANKINGS by QS, 2. FP7 database by EC DG Connect, 3. Bibliometrics: Web of Science by Thomson Reuters, 4. ICT R&d centres locations:
Design Activity Tool by IHS isuppli, 5. European Investment Monitor by Ernst & young, 6. Patent data:
each of the data source is described. 5. 1 QS WORLD UNIVERSITY RANKINGS by QS The rankings of Universities
It was formed in 2008 to meet the increasing public interest in comparative data on universities and organisations,
the EIPE study used QS proprietary datasets to investigate its subject area at three levels, namely academic and employer reputation surveys and the Scopus data for the Citations per Faculty indicator.
The data for citations originate from Scopus by Elsevier E. V. 17 Papers in Scopus are tagged with an ASJC (All Science Journal Classification) code
The main reason why this data source was selected for EIPE is that in addition to the university ranking, it also offers the rankings described above by teaching subject,
This data source, though carefully selected from a range of data sources pursuing similar purposes, shows some limitations.
The main constraint is that it offers only a limited number of universities, which does not allow us to cover the entire population of the European higher education institutions. 5. 2 FP7 database by EC DG Connect The Framework Programmes for Research and Technological Development,
also called Framework Programmes or abbreviated to FP1, through to FP7, are funding programmes created by the European union
The analysis of the Framework Programme 7 programmes and participants is based on the database provided by DG Connect in November 2011
The main reasons why this data source was selected for EIPE is that it offers a proxy for public R&d expenditures in ICT
This data source, though carefully selected, shows some limitations. The main constraint is that it offers only a limited snapshot of EU-level publicly-financed ICT R&d in Europe.
Web of Science by Thomson Reuters The Web of Science is an online academic citation index provided by Thomson Reuters. It is designed to provide access to multiple databases, cross-disciplinary research,
The Web of Science has indexing coverage from 1900 to the present. 17 More information at:
Coverage includes the sciences, social sciences, arts, and humanities, and it is also cross disciplinary. For the purpose of the EIPE exercise, journals classified in the Computer science research area are considered.
The main reason why this data source was selected for EIPE is that it offers a comprehensive overview of scientific output throughout the world divided into individual research areas
This data source, though carefully selected from a range of data sources pursuing similar purposes,
has some limitations. The main constraint is that it offers only limited possibilities with respect to the extraction of information at the level of, for example, authors.
Design Activity Tool by IHS isuppli The data used for the purpose of identification of ICT R&d centre locations originates from the 2011 IHS isuppli database,
The data on R&d locations is collected by IHS isuppli, an industry consultancy, 18 to map R&d locations
The main reason why this data source was selected for EIPE is that it offers relatively detailed unique information on the location and ownership of ICT R&d centres worldwide.
This data source, though carefully selected from a range of data sources pursuing similar purposes, shows some limitations.
For example, the characteristics of the dataset do not allow the building of time series. Also, the information available from this data source concentrates on the number of R&d centres, their ownership and location,
as detailed information on employment or R&d expenditures in those centres is not available at this level of granularity. 5. 5 European Investment Monitor by Ernst & young The European Investment Monitor (EIM) is a unique
Since 1997, data has been collected from all European countries and is published on a quarterly basis. As of 2011,
The basic description of each investment project described by the EIM data includes the name of the firm, the parent company name
The data collected by the EIM enables to: Review developments and movements in the inward investment marketplace, identify emerging sectors, industries and clusters,
The main reason why this data source was selected for EIPE is that it offers relatively detailed unique information on new investments in Europe and,
This data source, though carefully selected from a range of data sources pursuing similar purposes,
has some limitations. For example, as the EIM relies on data collection from the media, the main advantage of this source of information,
i e. being up-to-data and the speed of the information provision, can also be a disadvantage.
This is related to the fact that not all investments are reported by the media and, hence, they will not be available from this source to the EIM. 5. 6 Patent data:
REGPAT by OECD The OECD REGPAT database stores patent data, based on patent applications to the EPO and PCT filings, linked to more than 5 500 regions using the inventors/applicants addresses.
This information has been linked to NUTS3 regions according to the addresses of the applicants and inventors. The data have been regionalised at a very detailed level
so that more than 2 000 regions are covered across OECD countries. The selection of ICT patents follows the definition by OECD (OECD, 2008b.
The data from the REGPAT database, are constructed along the following principles: Inventor v. owner region:
Patent data can be regionalised on the basis of the address of either the inventor or the holder.
when interpreting the data. The methodology developed to identify regions on the basis of the addresses of patent inventors consists of an iterative procedure that matches postal codes and/or town names
The main reason why this data source was selected for EIPE is that it offers unique information on patenting activity at regional level across a number of countries,
This data source, though carefully selected, shows some limitations, which, if not taken into account, can affect the results of the EIPE project or their interpretation.
ORBIS by Bureau Van dijk Company-level information is taken from the ORBIS database by Bureau Van dijk. It contains comprehensive information on companies worldwide.
Geographic coverage: EU 27; The ICT industry was defined according to the NACE Rev 2 definition of the ICT sector (OECD, 2007;
Time coverage between 2005 and 2011, the last available date. Besides providing the company-level information that was used to count the number of firms or the employment,
and the location of their respective Headquarters originates from the Orbis database. The analysis presented in this report is based on company data from the 2009 EU industrial R&d Scoreboard 3 (henceforth the Scoreboard) in which R&d investment data,
and economic and financial data from the last four financial years are presented for the 1, 000 largest EU and 1, 000 largest non-EU R&d investors in 2008.
The Scoreboard covers about 80%of all company R&d investments worldwide. From the Scoreboard we have extracted the sub-set of ICT sector companies,
and then it was merged with the Bvd Orbis database. The R&d Scoreboard collects information on R&d investment
The merge with the database Orbis was done in order to collect the information on the individual shareholders that have relevant participations in group headquarters.
%As a result, in our database, the individual observation is a group, for which we have the R&d Scoreboard information together with information on up to a potential maximum of five shareholders, with their legal entity and details of the amount of shares.
%The main reason why this data source was selected for EIPE is that it offers unique and standardized information on company-level information for the ICT sector that can be presented regionalised
This data source, though carefully selected from a range of data sources pursuing similar purposes,
has some limitations. The most important limitation is the geographical coverage and the incompleteness of the data collected.
In addition, there are significant problems concerning the extraction of detailed information, e g. on a firm's ownership structure. 5. 8 Venture capital:
Venturesource by Dow jones Dow jones Venturesource provides comprehensive data on venture capital-backed and private equity-backed companies including their investors and executives in every region, industry sector and stage of development
2002), who provide a detailed overview of this database and compare it with Venture Economics (an alternative source of information),
the Venturesource data are generally more reliable, more complete, and less biased. This database contains information on venture capital transactions, the financed companies and the financing firms.
The data are reported largely self by venture capital firms, but the database conducted several plausibility checks.
The selection of ICT companies was based on Dow jones classifications and includes companies belonging to the following industry segments:
Communications & Networks, Electronics & Computers, Information Services, Semiconductors, Software and Other IT. This data source was selected for EIPE
because it offers unique and standardized information on venture capital deals with all the detailed information concerning the financed
This data source, though carefully selected from a range of data sources pursuing similar purposes,
has some limitations. Venturesource relies on the voluntary information provision by Venture capital funds and companies.
based on importance of its neighbours expressed by the quality of their connections. 42 6. 2 Patent data
The internationalisation of technology analysed with patent data. Research Policy, 30 (8), 1253-1266. Hagedoorn, J,
How Well do Venture capital Databases Reflect Actual Investments? Kominers, S. 2013. Measuring agglomeration. Boston: Harvard university.
Guidelines for Collecting and Interpreting Innovation Data: OECD Publishing. OECD. 2007. INFORMATION ECONOMY-SECTOR DEFINITIONS BASED ON THE INTERNATIONAL STANDARD INDUSTRY CLASSIFICATION (ISIC 4). Paris:
A gravity model using patent data. Research Policy, 39 (8), 1070-1081. Puga, D. 2010.
Uncertainty and sensitivity analysis techniques as tools for the quality assessment of composite indicators. Journal of the Royal Statistical Society Series A, 168 (2), 307-323.
Sensitivity analysis as an Ingredient of Modeling. Statistical Science, 15 (4), 377-395. Spizzirri, L. 2011.
Both of those identification processes are based on quantitative data, built on a set of relevant criteria leading to measurable indicators.
and data sources used in the study. z As the Commission's in-house science service,
Overtext Web Module V3.0 Alpha
Copyright Semantic-Knowledge, 1994-2011