Synopsis: Ict:


2015 Ireland Action Plan for Jobs.pdf

for Big data 53 3. 4 Winning Abroad 55 3. 5 Integrated Licensing Application Service 56 3. 6 Local Enterprise Offices 57 3

Developments in Financial services 133 11.4 Internet of things 133 11.5 Innovative/Advanced Manufacturing 134 11.6 Green Economy 135 11.7 National Institute for Bioprocessing Research

%80,000 The latest CSO data, shows that 80,000 additional people are compared at work to Q1 2012

IBM 2014 Global Location Trends Report Source: Global Innovation Index 2014 Source: IMD World Competitiveness Year Book 2014 (Ireland is at 11) APJ STRATEGIC AMBITION 3:

To build world-class clusters in key sectors of opportunity 9 of the 10 top global software companies 9 of the 10 top global pharmaceutical companies top global medical technologies companies

All 10 of the top 10 born on the internet companies US games publishers 3 of the 5 top 15 of the 20 € £ 14 of the 21 top

and to increase the levels of research, innovation and technology development for the benefit of enterprise. 1 IMD World Competitiveness Yearbook 2014 2 IMF, World Economic Outlook,

making it easier in particular to undertake business-to-consumer ecommerce. We are making €39 million available in Energy Saving Supports for business in 2015

launch of an Open Data Portal for access to public sector databases; launch of an Energy efficiency Fund and other measures such as raising standards in the retrofitting of homes;

and Key sectoral initiatives in high growth sectors such as food, manufacturing, software, internationally traded services and construction resulting in a more diversified trade portfolio for Enterprise Ireland clients,

Benchmark metrics are supplemented with key data from the annual surveys of the Enterprise Agencies, such as employment, expenditure, sales,

with the CSO data indicating that some of the largest employment increases have been in the domestic economy 5 Framework for the Evaluation of Enterprise Supports (2011), Forfás 2015 ACTION PLAN FOR JOBS 21 areas.

and further investment is planned in the provision of high speed fixed line and mobile broadband services.

across a range of areas including ICT, data analytics, international sales, engineering and entrepreneurship in initiatives such as the ICT Skills Action Plan, Springboard and Momentum.

The Department of education and Skills and the Higher education Authority are co-funding a promotional campaign centred on a new website www. ictworks. ie

. ie website in 2014 and the Expert Group on Future Skills Needs has reported some shortages in some niche skill areas.

DES, HEA, HEIS) 5 Devise and implement a programme around a single website portal, through industry and agencies working together,

SFI) 9 Provide support to institutions in delivering Summer Computing Camps to encourage secondlevel students to consider ICT careers.

Significant energy related developments such as the substantial number of data centres that are major users of energy with green credentials

and our recognised success in Big data and data analytics; Grid integration of renewables, with associated‘smart grid'components.

DJEI data indicates that currently 7 per cent of sales of indigenous firms is from new products and services,

including fit-for-purpose data infrastructure (see Section 3. 3 below). In 2014 Knowledge Transfer Ireland was launched

The key service offered is a web portal that enables companies to identify experts, research centres and technology-licensing opportunities to benefit their business.

Manufacturing Step Change, National Health Innovation Hub, Competitive Ecosystem for Big data, Winning Abroad, Integrated Licensing Application Service, Local Enterprise Offices, Trading Online,

DJEI, D/Health, EI, Joint Agency Project Team, Oversight Group) 2015 ACTION PLAN FOR JOBS 53 3. 3 Competitive Ecosystem for Big data

Building world class data management infrastructure. The overall ambition of the Disruptive Reform is to build on existing enterprise strengths to make Ireland a leading country in Europe in the area of Big data and Data Analytics.

A number of significant initiatives and investments were progressed in 2013 and 2014 in partnership with the enterprise sector

On behalf of the Task force on Big data, DJEI commissioned a review of Ireland's progress towards achieving this goal

and in 2014 the Government launched the Open Data Portal to act as the primary source of public sector datasets.

However in the face of strong European and international competition in this area the Task force has identified a number of new actions that will harness Big data for employment growth.

and develop a specific Big data agenda clarifying its leadership goals; 2. Building on our research strengths consolidate Ireland's leadership position in Big data/Data Analytics within Horizon 2020

and continue to promote engagement by enterprise in Ireland; 3. Continue to implement the recommendations of the EGFSN's report Assessing the demand for Big data and Analytics Skills;

4. Develop a coherent ecosystem to bridge the gap between R&d and innovation and take-up;

and 5. Develop an internationally competitive Data infrastructure. In light of the new policy actions required in 2015 the mission

and focus of the Big data Task force will be renewed to provide effective overarching coordination and monitoring to ensure that the strategic goals are achieved.

The Big data market is in an emerging phase of development and in order to achieve the benefits of data-driven innovation,

policy must take into account the full data value cycle and the role of all stakeholders. By focusing on developing a coherent ecosystem Ireland can bridge the gap between R&d

innovation and adoption and take the lead in developing concrete solutions and applications. An important element of the ecosystem for data-intensive companies operating in Ireland is the system of data protection

and the arrangements in place to ensure a robust approach to data protection consistent with EU law and international treaties.

To ensure that Ireland has a best-in-class system in place, the Department of the Taoiseach will establish an interdepartmental committee on data protection issues and related structures,

as well as a forum for dialogue with industry/civil society on related issues, and will progress a range of actions in 2015 (as set out below) in this regard. 54 2015 Actions Big data 86 Renew the mission

and focus of the Big Data Taskforce with the goal to oversee progress towards the strategic goals of the Disruptive Reform.

DJEI) 87 Identify and adopt specific targets for the Disruptive Reform including measurable KPIS. Task force on Big data, DJEI) 88 Monitor progress annually, based on the KPIS,

and produce a report updating/revising the main actions. Task force on Big data, DJEI) 89 Oversee the implementation of the actions arising from the IDC review

which sought to identify additional or revised policy actions in Ireland. Task force on Big data, DJEI) 90 The Task force on Big data will review the opportunities for Ireland arising from the Internet of things

and develop specific policy actions to develop those opportunities. Task force on Big data, DJEI, IDA) 91 Establish interdepartmental committee on data protection issues and related structures.

D/Taoiseach) 92 Establish a forum for dialogue with industry/civil society on issues arising from the continuing growth in personal data usage and technology.

D/Taoiseach) 93 Strengthen the resources of the Office of the Data protection Commissioner (ODP. D/Justice and Equality, DPER) 94 Establish a Dublin office for the ODP.

D/Justice and Equality, OPW) 95 Engage intensively with EU partners and stakeholders in relation to ongoing negotiations on Data protection regulation.

D/Justice and Equality) 2015 ACTION PLAN FOR JOBS 55 3. 4 Winning Abroad 2015 Action:

The RFT reflected the requirements of licensing authorities including managing licence application forms, registration of licensees, managing licence applications and renewals, remittance of licence fees, transmission and security of data

and the underpinning legislation for the 29 core licences across 40 licensing authorities that are being considered in the first phase of this project was undertaken and formed part of the Request for Tender documentation.

and commitment of additional staff in 2014, the focus has been on development of enhanced customer service (training, website, protocols with State bodies), seamless continuity of services (project supports/job creation, training

More information on the Scheme can be found at www. dcenr. gov. ie/nds and on the website of each LEO.

The digital economy represents 5 per cent of GDP, is growing at approximately 20 per cent per year

and examined on a monthly basis granular data from both the Bank of Ireland and the AIB.

It is important to recognise that in relation to commercial entities the type of data available varies in terms of both quality

and data and as such part of the work of the SME State Bodies group will involve engaging with commercial entities around the issue of improving the provision of data on SME lending.

and monitor data, including Central bank data, on lending to SMES from both bank and non-bank sources, including the full range of state sponsored initiatives and report on this issue to the Cabinet Committee on Economic Recovery and Jobs twice yearly.

SME State Bodies Group) 105 Detailed data from AIB and Bank of Ireland will be collated and examined, on a monthly basis ensuring a more informed understanding of the SME bank lending environment,

with a particular focus on new lending. D/Finance, Credit Review Office) 106 Following the passing of the appropriate primary legislation implement

This data suggest that 70 per cent of Irish SMES and large corporates use trade credit.

i the interview s industry ll games sect llion in 2015 ent has incre s industry alo ames sector-internet and um n 158,

The latest available data shows that exports of Irish goods and services have risen to a record €184 billion in 2013

EI) 143 Build on Phase One of the Pilot Industry-led clustering initiative involving fifty companies by implementing the recommendations of the clustering Review carried out in 2014.

We os. This cil. result the his eing 7, . 6 in this 2015 ACTION PLAN FOR JOBS 85 7. Entrepreneurship Central bank research shows that startup companies in the first five years of existence account for two thirds of all new jobs

website resources for entrepreneurs and supporters and the Startup Gathering annual survey of Ireland's startup sector) are planned by Startup Ireland in conjunction with the Startup Gathering initiative to maximise the impact of the Startup

The Startup Gathering website will provide a signposting of resources and supports available to entrepreneurs in Ireland thereby improving accessibility to Ireland's innovation system;

W e it will arrive ly saw a bett o being Irela was a busin a short 15 m d day with pit ng to see the A lot of peo ow you mee I couldn't sto Google is goi

Improved efficiency and accuracy of internal business processes as a result of improved accuracy and consistency of databases across public and private sectors;

which will improve logistical efficiency, the accuracy of databases across both the public and private sector and planning and analysis capabilities in both sectors.

DAFM) 242 A Memorandum of Understanding covering enhanced data cooperation between Revenue and the CSO to produce wider and deeper statistical analyses will reduce the administrative burden on businesses arising from CSO surveys.

The NCC highlight in particular the importance of investment in telecommunications, transport, energy and waste management.

The fiscal outlook for 2015 is better than in previous years. Ireland's economic recovery is well under way,

su Theref Health to wor with th Two si betwe manag first pr Patien patien data fr develo is targe throug

An independent panel was convened at end 2014 to review progress and will report in early 2015 on next steps to sustain progress

policy has been focused mainly around four core strategic policies:(a) prioritisation of public funds into areas of research that offer most potential for economic recovery and social progress (b) consolidation of resources in units of scale and scientific excellence (c) increased

in light of a more optimistic economic outlook, there is now an opportunity to set them in context

and complete the review of the Independent Panel established to review progress. DJEI, Research Prioritisation Action Group) 267 Develop a successor to the Strategy for Science,

targeted support for the beef sector via the highly innovative Beef Data and Genomics Programme;

Arts Council of Ireland, DAHG) 319 Continue to develop cultural digitisation initiatives in order to enhance Ireland's roots tourism offering.

and cultural and artistic collections either on Europeana or other websites in order to enhance our tourism offering to visitors from both home and abroad.

The initial phase of the project will advance the digitisation of a significant part of the Schools'Collection in time for the centenary of the 1916 Rising.

Oli of her busine website, but e became no ess funds we n City Local E usiness was g welcome sup htforward.

and he o book and p rom the US a e we're sleep write a blog y recommend eople in Dub via will hire a e

It desperate of the Tradin ided to apply t of financial process throu in the applic business plan veloped a pu er team to ef pay for the se and Australia ping, becaus on the site a d

the Trading l enterprise o ss ended up our online tr website with nage booking e. As a goo ew online se e difference. r social medi

a vacant site levy; and‘use-it -or-lose it'planning permissions; A dedicated taskforce on Housing Supply in Dublin which has examined housing demand

enabling of a vacant site levy; and a use it or lose it approach for planning permissions.

D/Defence) Wi A fla In 2 see whi pow wor com com The m vari well The Plan esta than and The each Biol gen site, perf cost

T l over €2 bill group work nning Author ablished that nks to their c simultaneou re is now on h of the Dep ogics sites.

The three win 3mw of elec each site's en educing each 0 per cent an ore attractiv allow Depuy logics to ach CO2 emissio duction proc removal

they all necessary ine establish GSK and Jan nd turbines w ctricity at eac nvironmenta h site's electr nd simultane ve for inward y Synthes,

me Area. ologics and N edical device nd have inves sultations wi in this area pany sites, an gthy applicat ronment.

Specific opportunity areas identified include Smart Ageing, Design, Financial services, the Internet of things, Additive Manufacturing, the Green Economy,

IDA, EI, DFAT) 11.4 Internet of things The Internet of things (Iot) will be critical to Ireland's economic future competitiveness both in attracting FDI and in building export orientated new enterprises.

Iot has the potential to provide the same transformation as the personal computer did for business in the 90's

and the internet has done for business in the last decade. Iot is the next evolution of the internet

and 134 companies in the next ten years will be using this technology to improve the competitiveness of their manufacturing,

Iot will generate huge volumes of data though smart connected objects leading to challenges around quality and interoperability that will have to be addressed.

In 2015 the Taskforce on Big data will assess the most appropriate policy response to this new and emerging opportunity

in order to stimulate the development of the Iot in a way that best supports enterprise and job creation. 2015 Actions Internet of things 352 The Task force on Big data will review the opportunities for Ireland arising from the Internet of things

and develop specific policy actions to develop those opportunities. Task force on Big data, DJEI, IDA) 11.5 Innovative/Advanced Manufacturing By 2020 manufacturing will be different from

what it is today. New materials (e g. ceramics, metals and alloys, powder, polymers, graphene,‘smart'materials) and associated new processing methods have the potential to revolutionise existing industries as well as to create new ones.

and speckled computing (wireless sensor networks). Additive manufacture (also known as 3d printing) is enabling development of extremely complex products without the normal stresses

and defects found in traditional manufactured objects. It also offers the scope to customise at no incremental cost and produce fewer items at lower cost.

and encouraging new technology intensive and software start-ups that are focused on addressing the needs of the manufacturing sector.

DJEI, Consultative Committee) 11.7 National Institute for Bioprocessing Research and Training In December 2014, the IDA Ireland Board approved €7. 5 million in core RD&I

and Big data in manufacturing: better use if data is derived from process and product analytics. In addition, the existing Principle Investigators in NIBRT are developing a research plan to get started in ADC manufacturing in collaboration with other centres such as SSPC (Synthesis

& Solid State Pharmaceutical Centre) and PMTC (Pharmaceutical Manufacturing Technology Centre) and this (combined with new recruitment) should position them well to win competitive grants from SFI,

DCENR) 369 Monitor public sector energy usage and publish an annual report on energy usage in the public sector.

Innovation Hub 2. 6-Intellectual Property in Enterprise 3. 3-Competitive Ecosystem for Big data 10-RD&I 11-New Sources of Growth 7. 2

Q3 2014=110.7 Q3 2014=126.0 Warehousing Computer services Legal and Accounting Services 2011=100 2011=100 2011=100 Q3 2014

Data Analytics Research CER Comprehensive Expenditure Review CRFS Clinical Research Facilities CRO Credit Review Office CSO Central Statistics Office CSSO

Direct Investment FET Further Education and Training FH2020 Food Harvest 2020 GEDI Global Entrepreneurship Development Index GEM Global Entrepreneurship Monitor GDP

INIS Irish Naturalisation and Immigration Service Iot Internet of things IP Intellectual Property IRC Irish Research Council ISIF

NSS National Skills Strategy NTD National Talent Drive ODPC Office of the Data protection Commissioner 2015 ACTION PLAN FOR JOBS 159 OECD Organisation for Economic


2015-April-Social_Innovation_in_Europe.pdf

The second chapter provides an outlook on the policy activities that are happening in the European union.

1. 1 Core elements and common features 6 1. 2 Social innovation and social entrepreneurship 8 2 EU initiatives and activities on social innovation

1. 1. 1 Core elements and common features Starting from their literature review, Caulier-Grice et al.

2012) suggest a number of common features and core elements of social innovation, which can be visualized in the following Fig. 1. 1. Fig. 1. 1 Core elements and common features of social innovation Source:

Caulier-Grice et al. 2012 Five core elements should be present to define a social innovation:

1) Novelty: Social innovations need to be new in some way, either new to the field, sector, region, market or user,

i) New products Assistive technologies developed for people with disabilities ii) New services Mobile banking iii) New processes Peer-to-peer collaboration

and crowdsourcing iv) New markets Fair Trade or time banking v) New platforms New legal

The site provides the latest information on European social innovation. This first concrete action was launched in 2011 as a virtual hub connecting social innovators and providing an overview of actions throughout Europe.

In the 2014-2020 programming period, social innovation is going to be mainstreamed. In the new regulation on the European Social Fund

personalised public services, using open data and services, enhancing transparency and decision-making processes of public administrations,

Against the update of structural data, the project will test these hypotheses on the qualitative impacts of the Third Sector in terms of capital building (e g. social networks,

and creating the core elements of the Transition Model. The Transition Town concept applied permaculture principles to develop a 12-step approach

2) the Transition Network's website: www. transitionnetwork. org 10 The idea of‘Peek oil'is one of the main motives of the transition towns movement. 11 Permaculture can be defined as‘consciously designed landscapes which mimic the patterns

which is a smallscale workshop equipped with an array of flexible computer controlled tools that help to transform ideas into real products through digital fabrication.

http://espas. eu/orbis/sites/default/files/generated/document/en/social innovation decade of changes. pdf Caulier-Grice, J. Davies, A. Patrick, R. Norman, W. 2012

http://csi. gsb. stanford. edu/sites/csi. gsb. stanford. edu/files/Themeaningofsocialentrepreneurship. pdf Dees, J. G. 2006) Taking Social Entrepreneurship Seriously.


42495745.pdf

download or print OECD content for your own use, and you can include excerpts from OECD publications, databases and multimedia products in your own documents, presentations, blogs,

websites and teaching materials, provided that suitable acknowledgment of OECD as source and copyright owner is given.

All requests for public or commercial use and translation rights should be submitted to rights@oecd. org. Requests for permission to photocopy portions of this material for public

and on other issues related to composite indicators can be found in the web page: http://composite-indicators. jrc. ec. europa. eu/The research was funded partly by the European commission, Research Directorate, under the project KEI (Knowledge Economy Indicators), Contract FP6 No. 502529.

23 1. 3 Imputation of missing data...24 1. 4 Multivariate analysis...25 1. 5 Normalisation of data...

27 1. 6 Weighting and aggregation...31 1. 7 Robustness and sensitivity...34 1. 8 Back to the details...

44 2. 2 Quality dimensions for basic data...46 2. 3 Quality dimensions for procedures to build

51 Step 3. Imputation of missing data...55 3. 1 Single imputation...55 3. 2 Unconditional mean imputation...

63 4. 1 Principal components analysis...63 4. 2 Factor analysis...69 4. 3 Cronbach Coefficient Alpha...

72 4. 4 Cluster analysis...73 4. 5 Other methods for multivariate analysis...79 6 HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

89 6. 1 Weights based on principal components analysis or factor analysis...89 6. 2 Data envelopment analysis (DEA...

91 6. 3 Benefit of the doubt approach (BOD...92 6. 4 Unobserved components model (UCM...

115 Step 7. Uncertainty and sensitivity analysis...117 7. 1 General framework...118 7. 2 Uncertainty analysis (UA...

118 7. 3 Sensitivity analysis using variance-based techniques...121 7. 3. 1 Analysis 1...124 7. 3. 2 Analysis 2...129 Step 8. Back to the details...

K-means for clustering TAI countries...78 Table 14. Normalisation based on interval scales...83 Table 15.

Examples of normalisation techniques using TAI data...87 Table 16. Eigenvalues of TAI data set...

90 Table 17. Factor loadings of TAI based on principal components...90 Table 18. Weights for the TAI indicators based on maximum likelihood (ML) or principal components (PC) method for the extraction of the common factors...

91 Table 19. Benefit of the doubt (BOD) approach applied to TAI...94 Table 20.

Data envelopment analysis (DEA) performance frontier...92 Figure 17. Analytical hierarchy process (AHP) weighting of the TAI indicators...

-Young-Levenglick CLA Cluster analysis DEA Data Envelopment Analysis DFA Discriminant Function Analysis DQAF Data Quality Framework EC European commission EM Expected

Maximisation EU European union EW Equal weighting FA Factor analysis GCI Growth Competitiveness Index GDP Gross domestic product GME Geometric aggregation HDI Human Development Index ICT Information

Non-compensatory multi-criteria analysis NMAR Not Missing At random in the context of imputation methods OECD Organization for Economic Co-operation and Development PC Principal Component PCA Principal

whereby a lot of work in data collection and editing is wasted or hidden behind a single number of dubious significance.

and use of composite indicators in order to avoid data manipulation and misrepresentation. In particular, to guide constructors

Data selection. Indicators should be selected on the basis of their analytical soundness, measurability, country coverage,

relevance to the phenomenon being measured and relationship to each other. The use of proxy variables should be considered

when data are scarce. Imputation of missing data. Consideration should be given to different approaches for imputing missing values.

Extreme values should be examined as they can become unintended benchmarks. Multivariate analysis. An exploratory analysis should investigate the overall structure of the indicators

assess the suitability of the data set and explain the methodological choices, e g. weighting, aggregation.

Skewed data should also be identified and accounted for. Weighting and aggregation. Indicators should be aggregated and weighted according to the underlying theoretical framework.

and Sensitivity analysis should be undertaken to assess the robustness of the composite indicator in terms of, e g.,, the mechanism for including

or excluding single indicators, the normalisation scheme, the imputation of missing data, the choice of weights and the aggregation method.

Back to the real data. Composite indicators should be transparent and fit to be decomposed into their underlying indicators or values.

and the underlying data are freely available on the Internet. For the sake of simplicity, only the first 23 of the 72 original countries measured by the TAI are considered here.

dimensions of technological capacity (data given) in Table A. 1 Creation of technology. Two individual indicators are used to capture the level of innovation in a society:(

i) diffusion of the Internet (indispensable to participation), and (ii) exports of high-and medium-technology products as a share of all exports.

telephones and electricity. These are needed to use newer technologies and have wide-ranging applications. Both indicators are expressed as logarithms,

The quality of a composite indicator as well as the soundness of the messages it conveys depend not only on the methodology used in its construction but primarily on the quality of the framework and the data used.

A composite based on a weak theoretical background or on soft data containing large measurement errors can lead to disputable policy messages

especially as far as methodologies and basic data are concerned. To avoid these risks, the Handbook puts special emphasis on documentation and metadata.

process. 2. Data selection Should be based on the analytical soundness, measurability, country coverage, and relevance of the indicators to the phenomenon being measured and relationship to each other.

when data are scarce (involvement of experts and stakeholders is envisaged at this step). To check the quality of the available indicators.

To create a summary table on data characteristics, e g.,, availability (across country, time), source, type (hard,

3. Imputation of missing data is needed in order to provide a complete dataset (e g. by means of single or multiple imputation).

To check the underlying structure of the data along the two main dimensions, namely individual indicators and countries (by means of suitable multivariate methods, e g.,

, principal components analysis, cluster analysis). ) To identify groups of indicators or groups of countries that are statistically similar

To compare the statisticallydetermined structure of the data set to the theoretical framework and discuss possible differences. 5. Normalisation Should be carried out to render the variables Comparable to select suitable normalisation procedure (s) that respect both the theoretical framework and the data properties.

To discuss the presence of outliers in the dataset as they may become unintended benchmarks.

To select appropriate weighting and aggregation procedure (s) that respect both the theoretical framework and the data properties.

and sensitivity analysis Should be undertaken to assess the robustness of the composite indicator in terms of e g.,, the mechanism for including

the normalisation scheme, the imputation of missing data, the choice of weights, the aggregation method.

To conduct sensitivity analysis of the inference (assumptions) and determine what sources of uncertainty are more influential in the scores

and/or ranks. 8. Back to the data is needed to reveal the main drivers for an overall good or bad performance.

To correlate the composite indicator with other relevant measures, taking into consideration the results of sensitivity analysis.

To develop data-driven narratives based on the results. 10. Visualisation of the results Should receive proper attention,

Criteria for assuring the quality of the basic data set for composite indicators are discussed in detail in Section 2:

the data selection process can be quite subjective as there may be no single definitive set of indicators.

A lack of relevant data may also limit the developer's ability to build sound composite indicators.

Given a scarcity of internationally comparable quantitative (hard) data, composite indicators often include qualitative (soft) data from surveys or policy reviews.

Proxy measures can be used when the desired data are unavailable or when cross-country comparability is limited.

For example data on the number of employees that use computers might not be available. Instead, the number of employees who have access to computers could be used as a proxy.

As in the case of soft data, caution must be taken in the utilisation of proxy indicators.

To the extent that data permit, the accuracy of proxy measures should be checked through correlation and sensitivity analysis.

The builder should also pay close attention to whether the indicator in question is dependent on GDP or other size-related factors.

To have an objective comparison across small and large countries scaling of variables by an appropriate size measure, e g. population, income, trade volume,

The quality and accuracy of composite indicators should evolve in parallel with improvements in data collection and indicator development.

The current trend towards constructing composite indicators of country performance in a range of policy areas may provide further impetus to improving data collection,

identifying new data sources and enhancing the international comparability of statistics. On the other hand we do not marry the idea that using

Poor data will produce poor results in a garbage-in garbage-out logic. From a pragmatic point of view,

Created a summary table on data characteristics, e g. availability (across country, time), source, type (hard,

1. 3. Imputation of missing data The idea of imputation could be both seductive and dangerous Missing data often hinder the development of robust composite indicators.

Data can be missing in a random or nonrandom fashion. The missing patterns could be:

Missing values do not depend on the variable of interest or on any other observed variable in the data set.

and if (ii) each of the other variables in the data set would have to be the same, on average,

but are conditional on other variables in the data set. For example, the missing values in income would be MAR

if the probability of missing data on income depends on marital status but, within each category of marital status,

whether data are missing at random or systematically, while most of the methods that impute missing values require a missing at random mechanism, i e.

There are three general methods for dealing with missing data:(i) case deletion,(ii) single imputation or (iii) multiple imputation.

The other two approaches consider the missing data as part of the analysis and try to impute values through either single imputation, e g. mean/median/mode substitution, regression imputation,

Markov Chain Monte carlo algorithm. Data imputation could lead to the minimisation of bias and the use of‘expensive to collect'data that would otherwise be discarded by case deletion.

However, it can also allow data to influence the type of imputation. In the words of Dempster & Rubin (1983:

The idea of imputation is both seductive and dangerous. It is seductive because it can lull the user into the pleasurable state of believing that the data are complete after all,

and it is dangerous because it lumps together situations where the problem is sufficiently minor that it can be handled legitimately in this way

and situations where standard estimators applied to real and imputed data have substantial bias. The uncertainty in the imputed data should be reflected by variance estimates.

This makes it possible to take into account the effects of imputation in the course of the analysis. However

single imputation is known to underestimate the variance, because it partially reflects the imputation uncertainty. The multiple imputation method,

A complete data set without missing values. A measure of the reliability of each imputed value

and the results. 1. 4. Multivariate analysis Analysing the underlying structure of the data is still an art Over the last few decades,

The underlying nature of the data needs to be analysed carefully before the construction of a composite indicator.

This preliminary step is helpful in assessing the suitability of the data set and will provide an understanding of the implications of the methodological choices, e g. weighting and aggregation,

and analysed along at least two dimensions of the data set: individual indicators and countries. Grouping information on individual indicators.

or appropriate to describe the phenomenon (see Step 2). This decision can be based on expert opinion and the statistical structure of the data set.

Factor analysis (FA) is similar to PCA, but is based on a particular statistical model. An alternative way to investigate the degree of correlation among a set of variables is to use the Cronbach coefficient alpha (c-alpha),

These multivariate analysis techniques are useful for gaining insight into the structure of the data set of the composite.

Cluster analysis is another tool for classifying large amounts of information into manageable sets. It has been applied to a wide variety of research problems and fields, from medicine to psychiatry and archaeology.

Cluster analysis is used also in the development of composite indicators to group information on countries based on their similarity on different individual indicators.

Cluster analysis serves as:(i) a purely statistical method of aggregation of the indicators, ii) a diagnostic tool for exploring the impact of the methodological choices made during the construction phase of the composite indicator,

and (iv) a method for selecting groups of countries for the imputation of missing data with a view to decreasing the variance of the imputed values.

or when is believed it that some of them do not contribute to identifying the clustering structure in the data set,

and then apply a clustering algorithm on the object scores on the first few components,

as PCA or FA may identify dimensions that do not necessarily help to reveal the clustering structure in the data

and weaknesses of multivariate analysis Strengths Weaknesses Principal Components/Factor analysis Can summarise a set of individual indicators while preserving the maximum possible proportion of the total variation in the original data set.

Sensitive to modifications in the basic data: data revisions and updates, e g. new countries. Sensitive to the presence of outliers,

which may introduce a spurious variability in the data. Sensitive to small-sample problems, which are particularly relevant

when the focus is limited on a set of countries. Minimisation of the contribution of individual indicators which do not move with other individual indicators.

Cluster analysis Offers a different way to group countries; gives some insight into the structure of the data set.

Purely a descriptive tool; may not be transparent if the methodological choices made during the analysis are motivated not

Various alternative methods combining cluster analysis and the search for a low-dimensional representation have been proposed, focusing on multidimensional scaling or unfolding analysis. Factorial k-means analysis combines k-means

cluster analysis with aspects of FA and PCA. A discrete clustering model together with a continuous factorial model are fitted simultaneously to two-way data to identify the best partition of the objects, described by the best orthogonal linear combinations of the variables (factors) according to the least-squares criterion.

This has a wide range of applications since it achieves a double objective: data reduction and synthesis, simultaneously in the direction of objects and variables.

Originally applied to short-term macroeconomic data, factorial k-means analysis has a fast alternating least-squares algorithm that extends its application to large data sets.

This methodology can be recommended as an alternative to the widely-used tandem analysis. By the end of Step 4 the constructor should have:

Checked the underlying structure of the data along various dimensions, i e. individual indicators, countries. Applied the suitable multivariate methodology, e g.

PCA, FA, cluster analysis. Identified subgroups of indicators or groups of countries that are statistically similar.

Analysed the structure of the data set and compared this to the theoretical framework. Documented the results of the multivariate analysis

and the interpretation of the components and factors. 1. 5. Normalisation of data Avoid adding up apples

and oranges Normalisation is required prior to any data aggregation as the indicators in a data set often have different measurement units.

A number of normalisation methods exist (Table 3)( Freudenberg 2003; Jacobs et al. 2004): 1. Ranking is the simplest normalisation technique.

the percentile bands force the categorisation on the data, irrespective of the underlying distribution. A possible solution is to adjust the percentile brackets across the individual indicators

The normalisation method should take into account the data properties, as well as the objectives of the composite indicator.

Selected the appropriate normalisation procedure (s) with reference to the theoretical framework and to the properties of the data.

A number of weighting techniques exist (Table 4). Some are derived from statistical models, such as factor analysis, data envelopment analysis and unobserved components models (UCM

METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 Weights may also be chosen to reflect the statistical quality of the data.

Higher weights could be assigned to statistically reliable data with broad coverage. However, this method could be biased towards the readily available indicators,

in the CI of E-business Readiness the indicator 1 I Percentage of firms using Internet and indicator 2

I Percentage of enterprises that have a website display a correlation of 0. 88 in 2003:

and cons. Statistical models such as principal components analysis (PCA) or factor analysis (FA) could be used to group individual indicators according to their degree of correlation.

such as the benefit of the doubt (BOD) approach, are extremely parsimonious about weighting assumptions as they allow the data to decide on the weights

as in the case of environmental indices that include physical, social and economic data. If the analyst decides that an increase in economic performance cannot compensate for a loss in social cohesion or a worsening in environmental sustainability,

and sensitivity Sensitivity analysis can be used to assess the robustness of composite indicators Several judgements have to be made

when constructing composite indicators, e g. on the selection of indicators, data normalisation, weights and aggregation methods, etc.

A combination of uncertainty and sensitivity analysis can help gauge the robustness of the composite indicator

Sensitivity analysis assesses the contribution of the individual source of uncertainty to the output variance. While uncertainty analysis is used more often than sensitivity analysis

and is treated almost always separately, the iterative use of uncertainty and sensitivity analysis during the development of a composite indicator could improve its structure (Saisana et al.,

2005a; Tarantola et al. 2000; Gall, 2007. Ideally, all potential sources of uncertainty should be addressed: selection of individual indicators, data quality, normalisation, weighting, aggregation method, etc.

The approach taken to assess uncertainties could include the following steps: 1. Inclusion and exclusion of individual indicators. 2. Modelling data error based on the available information on variance estimation. 3. Using alternative editing schemes,

e g. single or multiple imputation. 4. Using alternative data normalisation schemes, such as Mni-Max, standardisation,

use of rankings. 5. Using different weighting schemes, e g. methods from the participatory family (budget allocation,

No index can be better than the data it uses. But this is an argument for improving the data,

not abandoning the index. UN, 1992. The results of the robustness analysis are reported generally as country rankings with their related uncertainty bounds,

The sensitivity analysis results are shown generally in terms of the sensitivity measure for each input source of uncertainty.

The results of a sensitivity analysis are shown often also as scatter plots with the values of the composite indicator for a country on the vertical axis

does derived the theoretical model provide a good fit to the data? What the lack of fit tells about the conceptual definition of the composite of the indicators chosen for it?

Conducted sensitivity analysis of the inference, e g. to show what sources of uncertainty are more influential in determining the relative ranking of two entities.

While they can be used as summary indicators to guide policy and data work, they can also be decomposed such that the contribution of subcomponents

the laggard and the average performance (Figure 2). Finland's top ranking is primarily based on having the highest values for the indicators relating to the Internet and university,

Royalties Internet Telephones Tech exports Electricity Schooling University Top 3 (average) Finland United states Note: Technology Achievement Index (TAI.

875 925 TAI Patents Royalties Internet Tech exports Telephones Electricity Schooling University Performance range Finland 38 HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

Internet 86 X Tech exports 63 X Telephones 100 X Electricity 100 X Schooling 82 X University 100 X Japan

TAI 70 X Patents 100 X Royalties 24 X Internet 21 X Tech exports 100 X Telephones 100 X Electricity

Correlation analysis should not be mistaken with causality analysis. Correlation simply indicates that the variation in the two data sets is similar.

Tested the links with variations of the composite indicator as determined through sensitivity analysis. Developed data-driven narratives on the results Documented

and explained the correlations and the results. 1. 10. Presentation and dissemination A well-designed graph can speak louder than words The way composite indicators are presented is not a trivial issue.

JRC elaboration, data source: Eurostat, 2007. http://ec. europa. eu/eurostat 60 70 80 90 100 110 120 130 140 150 2000 2001 2002

related both to the quality of elementary data used to build the indicator and the soundness of the procedures used in its construction.

when quality was equated with accuracy. It is recognised now generally that there are other important dimensions. Even if data are accurate,

they cannot be said to be of good quality if they are produced too late to be useful,

or appear to conflict with other data. Thus, quality is multifaceted a concept. The most important quality characteristics depend on user perspectives, needs and priorities,

With the adoption of the European Statistics Code of practice in 2005, the Eurostat quality framework is now quite similar to the IMF's Data Quality Framework (DQAF),

in the sense that both frameworks provide a comprehensive approach to quality, through coverage of governance, statistical processes and observable features of the outputs.

3. Accuracy and reliability: Are the source data, statistical techniques, etc. adequate to portray the reality to be captured?

4. Serviceability: How are met users'needs in terms of timeliness of the statistical products, their frequency, consistency,

Are effective data and metadata easily available to data users and is there assistance to users?

HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS: METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 45 Given the institutional setup of the European Statistical System,

2. Accuracy refers to the closeness of computations or estimates to the exact or true values;

Punctuality refers to the time lag between the target delivery date and the actual date of the release of the data;

availability of micro or macro data, media (paper, CD-ROM, Internet, etc. Clarity refers to the statistics'information environment:

6. Coherence refers to the adequacy of the data to be combined reliably in different ways and for various uses.

and (ii) the quality of internal processes for collection, processing, analysis and dissemination of data and metadata.

i) the quality of basic data, and (ii) the quality of procedures used to build

the application of the most advanced approaches to the development of composite indicators based on inaccurate or incoherent data would not produce good quality results,

In the following section each is considered separately 2. 2. Quality dimensions for basic data The selection of basic data should maximise the overall quality of the final result.

In particular, in selecting the data the following dimensions (drawing on the IMF, Eurostat and OECD) are to be considered:

Relevance The relevance of data is a qualitative assessment of the value contributed by these data.

It depends upon both the coverage of the required topics and the use of appropriate concepts.

Careful evaluation and selection of basic data have to be carried out to ensure that the right range of domains is covered in a balanced way.

Given the actual availability of data proxy series are used often, but in this case some evidence of their relationships with target series should be produced whenever possible.

Accuracy The accuracy of basic data is the degree to which they correctly estimate or describe the quantities

Accuracy refers to the closeness between the values provided and the (unknown) true values. Accuracy has many attributes,

and in practical terms it has no single aggregate or overall measure. Of necessity, these attributes are measured typically

In the case of sample survey-based estimates, the major sources of error include coverage, sampling, non-response,

and censuses that provide source data; from the fact that source data do not fully meet the requirements of the accounts in terms of coverage, timing,

and valuation and that the techniques used to compensate can only partially succeed; from seasonal adjustment;

An aspect of accuracy is the closeness of the initially released value (s) to the subsequent value (s) of estimates.

which include (i) replacement of preliminary source data with later data,(ii) replacement of judgemental projections with source data,

however, the absence of revisions does not necessarily mean that the data are accurate. HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

accuracy of basic data is extremely important. Here the issue of credibility of the source becomes crucial.

The credibility of data products refers to confidence that users place in those products based simply on their image of the data producer, i e.,

One important aspect is trust in the objectivity of the data. This implies that the data are perceived to be produced professionally in accordance with appropriate statistical standards

and policies and that practices are transparent (for example, data are manipulated not, nor their release timed in response to political pressure).

Other things being produced equal, data by official sources (e g. national statistical offices or other public bodies working under national statistical regulations

or codes of conduct) should be preferred to other sources. Timeliness The timeliness of data products reflects the length of time between their availability

and the event or phenomenon they describe, but considered in the context of the time period that permits the information to be of value

The concept applies equally to short-term or structural data; the only difference is the timeframe.

Closely related to the dimension of timeliness, the punctuality of data products is also very important, both for national and international data providers.

and reflects the degree to which data are released in accordance with it. In the context of composite indicators, timeliness is especially important to minimise the need for the estimation of missing data or for revisions of previously published data.

As individual basic data sources establish their optimal trade-off between accuracy and timeliness, taking into account institutional, organisational and resource constraints

data covering different domains are released often at different points of time. Therefore special attention must be paid to the overall coherence of the vintages of data used to build composite indicators (see also coherence.

Accessibility The accessibility of data products reflects how readily the data can be located and accessed from original sources,

i e. the conditions in which users can access statistics (such as distribution channels, pricing policy, copyright, etc.).

The range of different users leads to considerations such as multiple dissemination formats and selective presentation of metadata.

which the data are available, the media of dissemination, and the availability of metadata and user support services.

It also includes the affordability of the data to users in relation to its value to them

and whether the user has a reasonable opportunity to know that the data are available

In the context of composite indicators, accessibility of basic data can affect the overall cost of production and updating of the indicator over time.

if poor accessibility of basic data makes it difficult for third parties to replicate the results of the composite indicators.

In this respect, given improvements in electronic access to databases released by various sources, the issue of coherence across data sets can become relevant.

Therefore, the selection of the source should not always give preference to the most accessible source,

Interpretability The interpretability of data products reflects the ease with which the user may understand

and analyse the data. The adequacy of the definitions of concepts, target populations, variables and terminology underlying the data

and of the information describing the limitations of the data, if any, largely determines the degree of interpretability.

The range of different users leads to considerations such as the presentation of metadata in layers of increasing detail.

the wide range of data used to build them and the difficulties due to the aggregation procedure require the full interpretability of basic data.

The availability of definitions and classifications used to produce basic data is essential to assess the comparability of data over time

and across countries (see coherence): for example, series breaks need to be assessed when composite indicators are built to compare performances over time.

Therefore the availability of adequate metadata is an important element in the assessment of the overall quality of basic data.

Coherence The coherence of data products reflects the degree to which they are connected logically and mutually consistent,

i e. the adequacy of the data to be combined reliably in different ways and for various uses.

Coherence implies that the same term should not be used without explanation for different concepts or data items;

that different terms should not be used for the same concept or data item without explanation;

and that variations in methodology that might affect data values should not be made without explanation.

Coherence in its loosest sense implies the data are at least reconcilable. For example, if two data series purporting to cover the same phenomena differ, the differences in time of recording, valuation,

and coverage should be identified so that the series can be reconciled. In the context of composite indicators, two aspects of coherence are especially important:

coherence over time and across countries. Coherence over time implies that the data are based on common concepts, definitions and methodology over time,

or that any differences are explained and can be allowed for. Incoherence over time refers to breaks in a series resulting from changes in concepts, definitions, or methodology.

Coherence across countries implies that from country to country the data are based on common concepts, definitions, classifications and methodology,

the imputation of missing data, as well as the normalisation and the aggregation, can affect its accuracy, etc.

In the following matrix, the most important links between each phase of the building process and quality dimensions are identified,

The imputation of missing data affects the accuracy of the composite indicator and its credibility.

The normalisation phase is crucial both for the accuracy and the coherence of final results.

The quality of basic data chosen to build the composite indicator strongly affects its accuracy and credibility.

Timeliness can also be influenced greatly by the choice of appropriate data. The use of multivariate analysis to identify the data structure can increase both the accuracy and the interpretability of final results.

This step is also very important to identify redundancies among selected phenomena and to evaluate possible gaps in basic data.

HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS: METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 49 One of the key issues in the construction of composite indicators is the choice of the weighting and aggregation model.

Almost all quality dimensions are affected by this choice, especially accuracy, coherence and interpretability. This is also one of the most criticised characteristics of composite indicators:

Analysis of this type can improve the accuracy, credibility and interpretability of the final results.

which are correlated highly with the reference data. The presentation of composite indicators and their visualisation affects both relevance and interpretability of the results.

The OECD has developed recently the Data and Metadata Reporting and Presentation Handbook (OECD, 2007), which describes practices useful to improve the dissemination of statistical products.

Table 5. Quality dimensions of composite indicators CONSTRUCTION QUALITY DIMENSIONS PHASE Relevance Accuracy Credibility Timeliness Accessibility Interpretability Coherence Theoretical framework Data

selection Imputation of missing data Multivariate analysis Normalisation Weighting and aggregation Back to the data Robustness and sensitivity Links to other variables Visualisation Dissemination HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

The problem of missing data is discussed first. The need for multivariate analysis prior to the aggregation of the individual indicators is stressed.

as well as the need to test the robustness of the composite indicator using uncertainty and sensitivity analysis.

=>k tqj tqj k k tqj tqj k c Ic x c x cc Pc x c x c where P and I indicate a preference

METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 55 STEP 3. IMPUTATION OF MISSING DATA The literature on the analysis of missing data

The predictive distribution must be generated by employing the observed data either through implicit or explicit modelling:

The focus is on an algorithm, with implicit underlying assumptions which need to be verified in terms of

The danger of this type of modelling of missing data is the tendency to consider the resulting data set as complete,

Filling in blanks cells with individual data, drawn from similar responding units. For example, missing values for individual income may be replaced with the income of another respondent with similar characteristics, e g. age, sex, race, place of residence, family relationships, job, etc.

and the time to convergence depends on the proportion of missing data and the flatness of the likelihood function. 56 HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

where the bias depends on the algorithm used to estimate the variance). Therefore, this method does not fully assess the implications of imputation

or the robustness of the composite index derived from the imputed data set. 3. 2. Unconditional mean imputation Let Xq be associated the random variable with the individual indicator q, with q=1,,

Hence, the inference based on the entire data set, including the imputed data, does not fully count for imputation uncertainty.

For nominal variables, frequency statistics such as the mode or hot-and cold-deck imputation methods might be more appropriate. 3. 4. Expected maximisation imputation Suppose that X denotes the matrix of data.

In the likelihood-based estimation, the data is assumed to be generated by a model, described by a probability or density function f (X),

and describes the probability of observing a data set for a given. Since is unknown,

The Expected maximisation (EM) algorithm is one of these iterative methods. 13 The issue is that X contains both observable and missing values, i e.

Assuming that missing data are MAR or MCAR14, the EM consists of two components: the expectation (E) and maximisation (M) steps.

Each step is completed once within each algorithm cycle. Cycles are repeated until a suitable convergence criterion is satisfied.

as if there were no missing data, and second (E), the expected values of the missing variables are calculated,

Effectively, this process maximises the expectation of the complete data log-likelihood in each cycle, conditional on the observed data and parameter vector.

however, an initial estimate of the missing data is needed. This is obtained by running the first M-step on the non-missing observations only

It can be used for a broad range of problems, e g. variance component estimation or factor analysis.

An EM algorithm is also often easy to construct conceptually and practically. Besides each step has a statistical interpretation

To test this, different initial starting values for can be used. 3. 5. Multiple imputation Multiple imputation (MI) is a general approach that does not require a specification of parameterised likelihood for all data (Figure 10.

The imputation of missing data is performed with a random process that reflects uncertainty. Imputation is done N times,

The parameters of interest are estimated on each data set, together with their standard errors. Average (mean or median) estimates are combined using the N sets

Logic of multiple imputation Data set with missing values Set 1 Set 2 Set N Result 1 Result 2 Result N Combine

It assumes that data are drawn from a multivariate normal distribution and requires MAR or MCAR assumptions The theory of MCMC is understood most easily using Bayesian methodology (Figure 11).

The observed data are denoted Xobs, and the complete data set, X=(Xobs, Xmis), where Xmis is to be filled in via multiple imputation.

If the distribution of Xmis, with parameter vector, were known, then Xmis could be imputed by drawing from the conditional distribution f (Xmis Xobs,).

it shall be estimated from the data, yielding, and using the distribution f (Xmis Xobs). Since is itself a random variable,

The missing-data generating process may also depend on additional parameters, but if and are independent,

and the analyst may concentrate on modelling the missing data given the observed data and. If the two processes are not independent,

then a non-ignorable missing-data generating process pertains, which cannot be solved adequately without making assumptions on the functional form of the interdependency. 60 HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

of which depends on the data. The first step in its estimation is to obtain the posterior distribution of from the data.

Usually this posterior is approximated by a normal distribution. After formulating the posterior distribution of, the following imputation algorithm can be used:

Draw*from the posterior distribution of, f (Y, Xobs), where Y denotes exogenous variables that may influence.

*Use the completed data X and the model to estimate the parameter of interest (e g. the mean)* and its variance V(*)within-imputation variance).

and covariance matrix from the data that does not have missing values. Use to estimate Prior distribution.

simulate values for missing data items by randomly selecting a value from the available distribution of values Posterior step:

i e. mean vector and cov. matrix are unchanged as we iterate) Use imputation from final iteration to form a data set without missing values need more iterations enough iterations HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 61 This combination will be the value that fills in the blank space in the data set.

The Multiple Imputation method imputes several values (N) for each missing value (from the predictive distribution of the missing data),

The N versions of completed data sets are analysed by standard complete data methods and the results combined using simple rules to yield single combined estimates (e g.

which formally incorporate missing data uncertainty. The pooling of the results of the analyses performed on the multiple imputed data sets implies that the resulting point estimates are averaged over the N completed sample points,

and the resulting standard errors and pvalues are adjusted according to the variance of the corresponding N completed sample point estimates.

Thus, the between-imputation variance provides a measure of the extra inferential uncertainty due to missing data

which method he/she has to use to fill in empty data spaces. To the best of our knowledge there is no definitive answer to this question but a number of rules of thumb (and lot of common sense.

The choice principally depends on the dataset available (e g. data expressed on a continuous scale or ordinal data where methods like MCMC cannot be used),

the number of missing data as compared to the dimension of the dataset (few missing data in a large dataset do not probably require sophisticated imputation methods),

and the identity of the country and the indicator for which the data is missing.

Therefore there is not"a"method we advise to use but the method should be fitted to the characteristics of the missing information.

eliminate some of the data (for the same countries and in the same proportion of the complete dataset),

and O (P) the average of the observed (imputed) data,) (O P the standard deviation of the observed (imputed) data.

As noticed by Willmott et al. 1985) the value of R2 could be unrelated to the sizes of the difference between the predicted and the observed values.

===N i i i N i i i P O N MAE P O N RMSE 1 1/2 1 2 1 1()Finally a complementary measure of accuracy

METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 63 STEP 4. MULTIVARIATE ANALYSIS Multivariate data analysis techniques

) The majority of methods in this section are thought for data expressed in an interval or ratio scale,

although some of the methods have been used with ordinal data (for example principal components and factor analysis, see Vermunt

& Magidson 2005). 4. 1. Principal components analysis The objective is to explain the variance of the observed data through a few linear combinations of the original data. 15

Even though there are Q variables, 1 2 Q x, x,..x, much of the data's variation can often be accounted for by a small number of variables principal components,

or linear relations of the original data, 1 2 Q Z, Z,..Z, that are uncorrelated.

At this point there are still Q principal components, i e.,, as many as there are variables. The next step is to select the first, e g.,

, P<Q principal components that preserve a high amount of the cumulative variance of the original data.

It indicates that the principal components are measuring different statistical dimensions in the data. When the objective of the analysis is to present a huge data set using a few variables,

some degree of economy can be achieved by applying Principal Components Analysis (PCA) if the variation in the Qoriginal x variables can be accounted for by a small number of Z variables.

the highest correlation is found between the individual indicators electricity and Internet, with a coefficient of 0. 84.

Table 6. Correlation matrix for individual TAI indicators PATENTS ROYALTIES INTERNET EXPORTS TELEPHONE ELECTRICITY SCHOOLING UNIVERSITY PATENTS 1. 00 0. 13-0

0. 32 0. 30 0. 06 INTERNET 1 00-0. 45 0. 56 0. 84 0. 63 0. 27 EXPORTS

1. 00 0. 00-0. 36-0. 35-0. 03 TELEPHONE 1. 00 0. 64 0. 30 0. 33 ELECTRICITY

with a variance of 1. 7. The third and fourth principal components have an eigenvalue close to 1. The last four principal components explain the remaining 12.8%of the variance in the data set.

Table 7. Eigenvalues of individual TAI indicators PC Eigenvalue%of variance Cumulative%1 3. 3 41.9 41.9 2 1. 7 21.8 63.7

Bootstrap refers to the process of randomly resampling the original data set to generate new data sets.

but the computation may be cumbersome. Various values have been suggested, ranging from 25 (Efron & Tibshirani, 1991) to as high as 1000 (Efron,

Although social scientists may be attracted to factor analysis as a way of exploring data whose structure is unknown,

Assumption of interval data. Kim & Mueller (1978) note that ordinal data may be used if it is thought that the assignment of ordinal categories to the data will not seriously distort the underlying metric scaling.

Likewise, the use of dichotomous data is allowed if the underlying metric correlation between the variables is thought to be moderate(.

7) or lower. The result of using ordinal data is that the factors may be much harder to interpret.

Note that categorical variables with similar splits will necessarily tend to correlate with each other, regardless of their content (see Gorsuch, 1983).

This is particularly apt to occur when dichotomies are used. The correlation will reflect similarity of"difficulty"for items in a testing context;

Principal component factor analysis (PFA), which is the most common variant of FA, is a linear procedure.

The smaller the sample size, the more important it is to screen data for linearity.

METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 67 Multivariate normality of data is required for related significance tests.

Note, however, that a variant of factor analysis, maximum likelihood factor analysis, does assume multivariate normality. The smaller the sample size, the more important it is to screen data for normality.

Moreover, as factor analysis is based on correlation (or sometimes covariance), both correlation and covariance will be attenuated when variables come from different underlying distributions (e g.,

, a normal vs. a bimodal variable will correlate less than 1. 0 even when both series are ordered perfectly co).

Underlying dimensions shared by clusters of individual indicators are assumed. If this assumption is met not, the"garbage in,

Factor analysis cannot create valid dimensions (factors) if none exist in the input data. In such cases, factors generated by the factor analysis algorithm will not be comprehensible.

Likewise, the inclusion of multiple definitionally-similar individual indicators representing essentially the same data will lead to tautological results.

Strong intercorrelations are required not mathematically, but applying factor analysis to a correlation matrix with only low intercorrelations will require nearly as many factors as there are original variables,

thereby defeating the data reduction purposes of factor analysis. On the other hand, too high intercorrelations may indicate a multicollinearity problem

and collinear terms should be combined or otherwise eliminated prior to factor analysis. Notice also that PCA and Factor analysis (as well as Cronbach's alpha) assume uncorrelated measurement errors.

a) The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy is a statistic for comparing the magnitudes of the observed correlation coefficients to the magnitudes of the partial correlation coefficients.

The concept is that the partial correlations should not be very large if distinct factors are expected to emerge from factor analysis (Hutcheson & Sofroniou, 1999).

A KMO statistic is computed for each individual indicator, and their sum is the KMO overall statistic.

KMO varies from 0 to 1. 0. A KMO overall should be. 60 or higher to proceed with factor analysis (Kaiser & Rice, 1974),

though realistically it should exceed 0. 80 if the results of the principal components analysis are to be reliable.

but common cut off criterion for suggesting that there is a multi-collinearity problem. Some researchers use the more lenient cut off VIF value of 5. 0. c) The Bartlett's test of sphericity is used to test the null hypothesis that the individual indicators in a correlation matrix are uncorrelated,

however, is whether the TAI data set for the 23 countries can be viewed as a random sample of the entire population, as required by the bootstrap procedures (Efron, 1987;

Several points can be made regarding the issues of randomness and representativeness of the data. First, it is often difficult to obtain complete information for a data set in the social sciences,

as controlled experiments are not always possible, unlike in natural sciences. As Efron and Tibshirani (1993) state that,‘in practice the selection process is seldom this neat,

A third point on data quality is that a certain amount of measurement error is likely to pertain.

While such measurement error can only be controlled at the data collection stage, rather than at the analytical stage

it is argued that the data represent the best estimates currently available (UN, 2001. Figure 12 (right graph) demonstrates graphically the relationship between the eigenvalues from the deterministic PCA,

0. 10 INTERNET-0. 92 0. 21 0. 02-0. 10 0. 04 0. 11-0. 27-0. 13 EXPORTS

0. 35-0. 85 0. 01-0. 13 0. 11 0. 35 0. 06-0. 08 TELEPHONES-0. 76-0

and of how the interpretation of the components might be improved are addressed in the following section on Factor analysis. 4. 2. Factor analysis Factor analysis (FA) is similar to PCA.

However, while PCA is based simply on linear data combinations, FA is based on a rather special model.

Contrary to the PCA, the FA model assumes that the data is based on the underlying factors of the model,

and that the data variance can be decomposed into that accounted for by common and unique factors.

Principal components factor analysis is preferred most in the development of composite indicators, e g. in the Product Market Regulation Index (Nicoletti et al.

and are sorted not into descending order according to how much of the original data set's variance is explained.

The first factor has high positive coefficients (loadings) with Internet (0. 79), electricity (0. 82) and schooling (0. 88).

Factor 4 is formed by royalties and telephones. Yet, despite the rotation of factors, the individual indicator exports has sizeable loadings in both Factor 1 (negative loading) and Factor 2 (positive loading.

ROYALTIES 0. 13 0. 07-0. 07 0. 93 0. 89 INTERNET 0. 79-0. 21 0. 21 0. 42

0. 89 EXPORTS-0. 64 0. 56-0. 04 0. 36 0. 86 TELEPHONES 0. 37 0. 17 0. 38

which indicates that university does not move with the other individual indicators in the data set,

This conclusion does not depend on the factor analysis method, as it has been confirmed by different methods (centroid method, principal axis method).

. 11 0. 88 0. 13 0. 80 ROYALTIES 0. 96 0. 14 0. 09 0. 18 0. 99 INTERNET 0

TELEPHONES 0. 41 0. 13 0. 18 0. 73 0. 75 ELECTRICITY 0. 13 0. 57-0. 13 0. 73

it is unlikely that they share common factors. 2. Identify the number of factors necessary to represent the data

The c-alpha is. 70 for the data set of the 23 countries which is equal to Nunnally's cut off value.

Telephones has the highest variable-total correlation and if deleted the coefficient alpha would be as low as 0. 60.

Note also that the factor analysis in the previous section had indicated university as the individual indicator that shared the least amount of common variance with the other individual indicators.

Although both factor analysis and the Cronbach coefficient alpha are based on correlations among individual indicators, their conceptual framework is different.

0. 527 0. 645 INTERNET 0. 566 0. 636 EXPORTS-0. 108 0. 774 TELEPHONES 0. 701 0. 603 ELECTRICITY

Cronbach coefficient alpha results for the 23 countries after deleting one individual indicator (standardised values) at a time. 4. 4. Cluster analysis Cluster analysis (CLA) is a collection of algorithms to classify objects such as countries, species,

The classification aims to reduce the dimensionality of a data set by exploiting the similarities/dissimilarities between cases.

if the classification has an increasing number of nested classes, e g. tree clustering; or nonhierarchical when the number of clusters is decided ex ante,

e g. k-means clustering. However, care should be taken that classes are meaningful and not arbitrary or artificial.

including Euclidean and non-Euclidean distances. 18 The next step is to choose the clustering algorithm,

and hence different classifications may be obtained for the same data, even using the same distance measure.

if the data are categorical in nature. 76 HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS: METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 Figure 13 shows the country clusters based on the individual Technology Achievement Index

Standardised Data type: Hierarchical, single linkage, squared Euclidean distances. Sudden jumps in the level of similarity (abscissa) could indicate that dissimilar groups

which indicates that the data are represented best by ten clusters: Finland; Sweden and the USA;

METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 77 A nonhierarchical method of clustering,

is k-means clustering (Hartigan, 1975). This method is useful when the aim is to divide the sample into k clusters of the greatest possible distinction.

This algorithm can be applied with continuous variables, yet it can be modified also to accommodate other types of variables.

The algorithm starts with k random clusters and moves the objects in and out of the clusters with the aim of (i) minimising the variance of elements within the clusters,

At the same time, the dynamic adopters are lagging behind the potential leaders due to their lower performance on Internet, electricity and schooling.

Means plot for TAI clusters-2-1. 5-1-0. 5 0 0. 51 1. 52 RECEIPTS INTERNET EXPORTS TELEPHONES ELECTRICITY

k-means clustering (standardised data. Table 13. K-means for clustering TAI countries Group1 (leaders) Group 2 (potential leaders) Group 3 (dynamic adopters) Finland Netherlands Sweden USA Australia

Canada New zealand Norway Austria Belgium Czech Rep. France Germany Hungary Ireland Israel Italy Japan Korea Singapore Slovenia Spain UK Finally, expectation

EM estimates mean and standard deviation of each cluster so as to maximise the overall likelihood of the data

Second, unlike k-means, EM can be applied to both continuous and categorical data. HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

Various alternative methods combining cluster analysis and the search for a low-dimensional representation have been proposed and focus on multidimensional scaling or unfolding analysis (e g.

A method that combines k-means cluster analysis with aspects of Factor analysis and PCA is offered by Vichi & Kiers (2001.

A discrete clustering model and a continuous factorial model are fitted simultaneously to two-way data with the aim of identifying the best partition of the objects, described by the best orthogonal linear combinations of the variables (factors) according to the least-squares criterion.

This methodology, known as factorial k-means analysis, has a wide range of applications, since it achieves a double objective:

data reduction and synthesis simultaneously in the direction of objects and variables. Originally applied to short-term macroeconomic data,

factorial k-means analysis has a fast alternating least-squares algorithm that extends its application to large data sets, i e.,

, multivariate data sets with>2 variables. The methodology can therefore be recommended as an alternative to the widely used tandem analysis that sequentially performs PCA

and CLA. 4. 5. Other methods for multivariate analysis Other methods can be used for multivariate analysis of the data set.

The characteristics of some of these methods are sketched below citing textbooks where the reader may find additional information and references.

(or relevant relationships between rows and columns of the table) by reducing the dimensionality of the data set.

unlike factor analysis. This technique finds scores for the rows and columns on a small number of dimensions which account for the greatest proportion of the 2 for association between the rows and columns,

Correspondence analysis starts with tabular data, e g. a multidimensional time series describing the variable number of doctorates in 12 scientific disciplines (categories) given in the USA between 1960 and 1975 (Greenacre

The correspondence analysis of this data would show, for example, whether anthropology and engineering degrees are at a distance from each other (based on the 80 HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

However, while conventional factor analysis determines which variables cluster together (parametric approach), correspondence analysis determines which category values are close together (nonparametric approach).

As in PCA, CCA implies the extraction of the eigenvalues and eigenvectors of the data matrix.

When the dependent variable has more than two categories then it is a case of multiple Discriminant analysis (or also Discriminant Factor analysis or Canonical Discriminant analysis), e g. to discriminate countries on the basis of employment patterns in nine industries (predictors.

This is the main difference to Cluster analysis, in which groups are predetermined not. There are also conceptual similarities with Principal Components and Factor analysis

but while PCA maximises the variance in all the variables HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS: METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 81 accounted for by a factor, DFA maximises the differences between values of the dependent.

and their robustness against possible outliers in the data (Ebert & Welsch, 2004). Different normalisation methods will produce different results for the composite indicator.

Using Celsius data normalised based on distance to the best performer, the level of Country A has increased over time.

40 Country B Humidity(%)50 45 Normalised data in Celsius Country A 0. 949 0. 949 Country B 0. 833 0. 821

Normalised data in Fahrenheit Country A 0. 965 0. 965 Country B 0. 833 0. 821 The example illustrated above is a case of an interval scale (Box 3

Another transformation of the data, often used to reduce the skewness of (positive) data, is the logarithmic transformation f:

bearing in mind that the normalised data will be affected by the log transformation. In some circumstances outliers21 can reflect the presence of unwanted information.

as well as to avoid having extreme values overly dominate the aggregation algorithm. That is, any observed value greater than the 97.5 percentile is lowered to match the 97.5 percentile.

when data for a new time point become available. This implies an adjustment of the analysis period T,

I. To maintain comparability between the existing and the new data, the composite indicator for the existing data must be recalculated. 5. 4. Distance to a reference This method takes the ratios of the indicator tqc x for a generic country c and time t with respect to the individual

indicator t0 qc c x=for the reference country at the initial time 0 t. 0tt qc qc tqc c x

Examples of normalisation techniques using TAI data Mean years of school age 15 and above) Rank*z-score Min-Max distance to reference country (c) Above/below the mean(**Percentile

0. 00-1 4. 3 0(*)High value=Top in the list(**p=20%Examples of the above normalisation methods are shown in Table 15 using the TAI data.

The data are sensitive to the choice of the transformation and this might cause problems in terms of loss of the interval level of the information,

thus poor data availability may hamper its use. In formal terms, let qi x be as usual the level of indicator q for country

AND AGGREGATION WEIGHTING METHODS 6. 1. Weights based on principal components analysis or factor analysis Principal components analysis,

and more specifically factor analysis, groups together individual indicators which are collinear to form a composite indicator that captures as much as possible of the information common to individual indicators.

the composite no longer depends upon the dimensionality of the data set but rather is based on the statistical dimensions of the data.

According to PCA/FA, weighting intervenes only to correct for overlapping information between two or more correlated indicators and is not a measure of the theoretical importance of the associated indicator.

The first step in FA is to check the correlation structure of the data, as explained in the section on multivariate analysis.

The second step is the identification of a certain number of latent factors (fewer than the number of individual indicators) representing the data.

For a factor analysis only a subset of principal components is retained (m i e. those that account for the largest amount of the variance.

%With the reduced data set in TAI (23 countries) the factors with eigenvalues close to unity are the first four,

Eigenvalues of TAI data set Eigenvalue Variance(%)Cumulative variance(%)1 3. 3 41.9 41.9 2 1. 7 21.8 63.7 3 1

Rotation is a standard step in factor analysis it changes the factor loadings and hence the interpretation of the factors,

0. 49 Internet 0. 79-0. 21 0. 21 0. 42 0. 24 0. 03 0. 04 0. 10 Tech

exports-0. 64 0 56-0. 04 0. 36 0. 16 0. 23 0. 00 0. 07 Telephones 0. 37 0

With the TAI data set there are four intermediate composites (Table 17. The first includes Internet (with a weight of 0. 24),

electricity (weight 0. 25) and schooling (weight 0. 29). 24 Likewise the second intermediate is formed by patents

the third only by university (0. 77) and the fourth by royalties and telephones (weighted with 0. 49 and 0. 26).

The four intermediate composites are aggregated by assigning a weight to each one of them equal to the proportion of the explained variance in the data set:

METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 91 Likelihood (ML) were to be used instead of Principal Components (PC),

Weights for the TAI indicators based on maximum likelihood (ML) or principal components (PC) method for the extraction of the common factors M L PC Patents 0. 19 0

. 17 Royalties 0. 20 0. 15 Internet 0. 07 0. 11 Tech exports 0. 07 0. 07 Telephones 0. 15 0

. 08 Electricity 0. 11 0. 12 Schooling 0. 19 0. 14 University 0. 02 0. 16 6. 2. Data envelopment

analysis (DEA) Data Envelopment Analysis (DEA) employs linear programming tools to estimate an efficiency frontier that would be used as a benchmark to measure the relative performance of countries. 26 This requires construction of a benchmark (the frontier) and the measurement

Data envelopment analysis (DEA) performance frontier Indicator 1 Indicator 2 a b c d d'0 Source: Rearranged from Mahlberg

and may then be solved using optimisation algorithms: k M q Q w I w s t CI I w qkqq qk qk Qq qc qc wqc c 1...

Benefit of the doubt (BOD) approach applied to TAI Patents Royalties Internet Tech. Export Telephones Electricity Schooling University CI (weight)( weight)( weight)( weight)( weight)( weight)( weight)( weight)( score) Finland 0. 15 0. 17 0. 17 0. 16

0. 19 0. 17 0. 17 0. 19 1 United States 0. 20 0. 20 0. 17 0. 21 0

the percentage of firms using internet in country j depends upon the (unknown) propensity to adopt new information

The observed data consist in a cluster of q=1 Q (c) indicators, each measuring an aspect of ph (c). Let c=1,

However, since not all countries have data on all individual indicators, the denominator of w c,

and 2 s q (hence at least 3 indicators per country are needed for an exactly identified model) so the likelihood function of the observed data based on equation (25) will be maximised with respect to (q) s,(q) s,

AHP allows for the application of data, experience, insight, and intuition in a logical and thorough way within a hierarchy as a whole.

) The core of AHP is an ordinal pairwise comparison of attributes. For a given objective, the comparisons are made between pairs of individual indicators,

Comparison matrix of eight individual TAI indicators Objective Patents Royalties Internet Tech exports Telephone Electricity Schooling University Patents 1 2 3

2 5 5 1 3 Royalties 1/2 1 2 1/2 4 4 3 Internet 1/3 1 1

/4 2 2 1/5 1/2 Tech. exports 1/2 2 4 1 4 4 1/2 3 Telephones 1

patents is three times more important than Internet. Each judgement reflects the perception of the relative contributions (weights) of the two individual indicators to the overall objective (Table 21.

Comparison matrix of three individual TAI indicators Objective Patents Royalties Internet Patents 1 wp/wroy wp/wi Royalties wroy/wp 1

wroy/wi Internet wi/wp wi/wroy 1 The relative weights of the individual indicators are calculated using an eigenvector.

) Conjoint analysis is a decompositional multivariate data analysis technique frequently used in marketing (Mcdaniel & Gates,

Although this methodology uses statistical analysis to treat the data, it relies on the opinion of people (e g. experts, politicians, citizens),

when dealing with environmental issues. 6. 9. Performance of the different weighting methods The weights for the TAI example are calculated using different weighting methods equal weighting (EW), factor analysis (FA), budget allocation (BAP

The role of the variability in the weights and their influence on the value of the composite are discussed in the section on sensitivity analysis. 100 HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

TAI weights based on different methods Equal weighting (EW), factor analysis (FA), budget allocation (BAP), analytic hierarchy process (AHP) Method Weights for the indicators (fixed for all countries) Patents Royalties Internet Tech exports Telephones Electricity Schooling University EW 0

. 13 0. 13 0. 13 0. 13 0. 13 0. 13 0. 13 0. 13 FA 0. 17 0. 15

Reliability and robustness of results depend on the availability of sufficient data. With highly correlated individual indicators there could be identification problems.

Employment Outlook (OECD, 1999; Composite Indicator on E-business Readiness (EC, 2004b; National Health care System Performance (King's Fund.

) Can be used both for qualitative and quantitative data. Transparency of the composite is higher. Weighting is based on expert opinion and not on technical manipulations.

is it possible to find a ranking algorithm consistent with some desirable properties?;and conversely, is it possible to ensure that no essential property is lost?

Let us then apply Borda's rule to the data presented in Table 25. To begin, the information can be presented in a frequency matrix fashion,

let us apply it to the data presented in Table 25. The outranking matrix is shown in Table 27;

the derivation of a Condorcet ranking may sometimes be a long and complex computation process.

the algorithm per se is very simple. The maximum likelihood ranking of countries is supported the ranking by the maximum number of individual indicators for each pairwise comparison,

To solve this problem it is necessary to use numerical algorithms. To conclude, consider the numerical example in Table 34 with three countries

For explanatory purposes, consider only five of the countries included in the TAI data set39

Impact matrix for TAI (five countries) Patents Royalties Internet Tech exports Telephones Electricity Schooling University Finland 187 125.6 200.2 50.7 3. 080

which enters into the computation of the overall importance of country a, in a manner consistent with the definition of weights as importance measures. 114 HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

the pairwise comparison of countries such as Finland and the United states shows that Finland has better scores for the individual indicators Internet (weight 1/8),

telephones (weight 1/8), electricity (weight 1/8) and university (weight 1/8) . Thus the score for Finland is 4*1/8=0. 5,

only if data are expressed all on a partially comparable interval scale (i e. temperature in Celsius or Fahrenheit) of type:+>

Non-comparable data measured on a ratio scale (i e. kilograms and pounds: where>0 i i x x f i e. i varying across individual indicators) can only be aggregated meaningfully by using geometric functions,

technique for the TAI data set with 23 countries. Although in all cases equal weighting is used,

AND SENSITIVITY ANALYSIS Sensitivity analysis is considered a necessary requirement in econometric practice (Kennedy, 2003) and has been defined as the modeller's equivalent of orthopaedists'X-rays.

This is what sensitivity analysis does: it performs the‘X-rays'of the model by studying the relationship between information flowing in and out of the model.

More formally, sensitivity analysis is the study of how the variation in the output can be apportioned, qualitatively or quantitatively, to different sources of variation in the assumptions,

Sensitivity analysis is thus closely related to uncertainty analysis, which aims to quantify the overall uncertainty in country rankings as a result of the uncertainties in the model input.

A combination of uncertainty and sensitivity analysis can help to gauge the robustness of the composite indicator ranking

Below is described how to apply uncertainty and sensitivity analysis to composite indicators. Our synergistic use of uncertainty and sensitivity analysis has recently been applied for the robustness assessment of composite indicators (Saisana et al.

2005a; Saltelli et al. 2008) and has proven to be useful in dissipating some of the controversy surrounding composite indicators such as the Environmental Sustainability Index (Saisana et al.

and sensitivity analysis discussed below in relation to the TAI case study is only illustrative. In practice the setup of the analysis will depend upon which sources of uncertainty and

inclusion/exclusion of one indicator at a time, imputation of missing data, different normalisation methods, different weighting schemes and different aggregation schemes.

()c Rank CI is an output of the uncertainty/sensitivity analysis. The average shift in country rankings is explored also.

The investigation of()c Rank CI and S R is the scope of the uncertainty and sensitivity analysis. 41 7. 1. General framework The analysis is conducted as a single Monte carlo experiment,

1 X Estimation of missing data 1 Use bivariate correlation to impute missing data 2 Assign zero to missing datum The second input factor,

and applying the so-called Russian roulette algorithm, e g. for 1 X, select 1 if 0, 0. 5) and 2 if 0. 5,

A scatter plot based sensitivity analysis would be used to track which indicator affects the output the most

When BOD is selected the exclusion of individual indicators leads to a re-execution of the optimisation algorithm.

(either for the BAP or AHP schemes) are assigned to the data. Clearly the selection of the expert has no bearing

they will nevertheless be generated by the random sample generation algorithm. The constructive dimension of the Monte carlo experiment, the number of random HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

such as the variance and higher order moments, can be estimated with an arbitrary level of precision related to the size of the simulation N. 7. 3. Sensitivity analysis using variance-based techniques A necessary step

when designing a sensitivity analysis is to identify the output variables of interest. Ideally these should be relevant to the issue addressed by the model.

2008), with nonlinear models, robust, model-free techniques should be used for sensitivity analysis. Sensitivity analysis using variance-based techniques are model-free

and display additional properties convenient in the present analysis, such as the following: They allow an exploration of the whole range of variation of the input factors, instead of just sampling factors over a limited number of values, e g. in fractional factorial design (Box et al.

They allow for a sensitivity analysis whereby uncertain input factors are treated in groups instead of individually; They can be justified in terms of rigorous settings for sensitivity analysis.

To compute a variance-based sensitivity measure for a given input factor i X, start from the fractional contribution to the model output variance,

The I s, Ti S, in the case of nonindependent input factors, could also be interpreted as settings for sensitivity analysis. 124 HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

'for both dependent and independent input factors, are implemented in the freely distributed software SIMLAB (Saltelli et al.,

i e. by censoring all countries with missing data. As a result, only 34 countries, in theory, may be analysed.

Hong kong, as this is the first country with missing data. The analysis is restricted to the set of countries

Figure 19 shows the sensitivity analysis based on the first-order indices. The total variance in each country's rank is presented

%This underlines the necessity of computing higher order sensitivity indices that capture the interaction effect among the input factors. 126 HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

The sensitivity analysis results for the average shift in rank output variable (equation (38)) is shown in Table 40.

Practically, however, in the absence of a genuine theory on what causes what, the correlation structure of the data set can be of some help in at least excluding causal relationships between variables (but not necessarily between the theoretical constructs

However, a distinction should be made between spatial data (as in the case of TAI) and data

The case of spatial data is complicated more, but tools such as Path Analysis and Bayesian networks (the probabilistic version of path analysis) could be of some help in studying the many possible causal structures

whereas a low value would point to the absence of a linear relationship (at least as far the data analysed are concerned).

However, the resulting path coefficients or correlations only reflect the pattern of correlation found in the data.

Simple example of path analysis A c B D rba pad pbc pbd PCD The standardised regression coefficients (beta values) for the TAI example reveal that Internet

Two indicators, telephones and electricity, appear not to be influential on the variance in the TAI scores.

Standardised regression coefficients for the TAI LOG TELEPHONE LOG ELECTRICITY SCHOOLING EXPORTS UNIVERSITY RECEIPTS PATENTS INTERNET 0 0. 1 0. 2 0. 3 0. 4

8 indicators) TAI 6 (Excluding telephones and electricity) Absolute Difference (TAI-TAI6) Finland 1 1 0 United states 2 2 0 Sweden 3 3 0 Japan 4 4

exclude the two least influential indicators from the TAI data set. Even in this scenario, TAI6, the indicators are weighted equally.

which excludes telephones and electricity, produces a ranking that is identical to the original TAI for 18 of the 23 countries.

Thereafter, in a parsimonious approach, the indicators on telephones and electricity would be excluded, as they do not have an impact on either the variance of the TAI scores or on the TAI ranking.

which is reflected in the inferiority of the indicator telephones and electricity, while others (e g. Internet and patents) are highly influential on the variance of the TAI scores.

The path analysis results based on the standardised regression coefficients and the bivariate correlations between the indicators as explained above,

total effect impact of the indicators on the TAI scores Patents 13.5%Royalties 16.2%Internet 13.9%Tech exports 6. 1%Telephones (logarithm) 15.8%Electricity (logarithm) 10.6

In the 1970s factor analysis and latent variables enriched path analysis giving rise to the field of Structural Equation Modelling (SEM, see Kline, 1998.

Measurement techniques such as factor analysis and item response theory are used to relate latent variables to the observed indicators (the measurement model),

(i) determine that the pattern of covariances in the data is consistent with the model specified;(

of which path diagram is more likely to be supported by the data. Bayesian networks are graphical models encoding probabilistic relationships between variables.

when combined with the data via Bayes'theorem, produces a posterior distribution. This posterior, in short a weighted average of the prior density and of the conditional density (conditional on the data), is the output of 136 HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 Bayesian analysis. Note that the output of classical analysis is rather a point estimate.

ii) to see how the different evidence (the data available) modifies the probability of each given node;(

the use of Bayesian networks is becoming increasingly common in bioinformatics, artificial intelligence and decision-support systems; 45 however, their theoretical complexity and the amount of computer power required to perform relatively simple graph searches make them difficult to implement in a convenient manner.

HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS: METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 137 CONCLUDING REMARKS COMPOSITE INDICATORS:

and the testing of the robustness of the composite using uncertainty and sensitivity analysis. The present work is perhaps timely,

The table below shows the number of hits obtained by searching for composite indicators through Google (taken here as a proxy of overall diffusion of the concept)

and Scholar Google (taken as a proxy of academic interest). Google Scholar Google October 2005 35,500 992 June 2006 80,800 1, 440 September 2007 2 million 167,000 We alluded in the introduction to the controversy surrounding the use

of these measures, pitting aggregators against non-aggregators. The authors of this Handbook believe that individual variables

When questioned on the data reliability/quality of the HDI, Haq said that it should be used to improve data quality,

rather than to abandon the exercise. In fact, all the debate on development and public policy and media attention would not have been possible

where inter-subjectivity may be at the core of the exercise, such as when participative approaches to weight negotiations are adopted.

Similarly, Nicoletti and others make use of factor analysis in the analysis of, for example, product market regulation in OECD countries (Nicoletti et al.,

The media coverage of events such as the publishing of the World Economic Forum's World Competitiveness Index and Environmental Sustainability Index

METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 141 REFERENCES Anderberg M. R. 1973), Cluster analysis for Applications, New york:

Binder D. A. 1978), Bayesian Cluster analysis, Biometrika, 65:31-38. Borda J. C. de (1784), Mémoire sur les élections au scrutin, in Histoire de l'Académie Royale des Sciences, Paris. Boscarino

. and Yarnold P. R. 1995), Principal components analysis and exploratory and confirmatory factor analysis. In Grimm and Yarnold, Reading and understanding multivariate analysis.

In Sensitivity analysis (eds, Saltelli A.,Chan K.,Scott M.)167-197. New york: John Wiley & Sons.

Chantala K, Suchindran C.,(2003) Multiple Imputation for Missing Data. SAS Onlinedoctm: Version 8. Charnes A.,Cooper W. W.,Lewin A y,

. and Seiford L. M. 1995), Data Envelopment Analysis: Theory, Methodology and Applications. Boston: Kluwer. 142 HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

Cherchye L. 2001), Using data envelopment analysis to assess macroeconomic policy performance, Applied Economics, 33: 407-416 Cherchye L. and Kuosmanen T. 2002), Benchmarking sustainable development:

Davis J. 1986), Statistics and Data analysis in Geology, John Wiley & Sons, Toronto. Debreu G. 1960), Topological methods in cardinal utility theory, in Arrow K. J.,Karlin S. and Suppes P. eds.

Dempster A p. and Rubin D. B. 1983), Introduction (pp. 3-10), in Incomplete Data in Sample Surveys (vol. 2:

Efron, B.,Tibshirani, R. 1991), Statistical data analysis in the computer age. Science 253,390-395. HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

(2004b), Composite Indicator on e-business readiness, DG JRC, Brussels. Everitt B. S. 1979), Unresolved Problems in Cluster analysis, Biometrics, 35: 169-181.

Forman E. H. 1983), The analytic hierarchy process as a decision support system, Proceedings of the IEEE Computer society.

Funtowicz S. O.,Munda G.,Paruccini M. 1990), The aggregation of environmental data using multicriteria methods, Environmetrics, Vol. 1 (4): 353-36.

METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 Golub G. H. and van der Vorst, H. A. 2000), Eigenvalue computation

in the 20th century, Journal of Computational and Applied mathematics, Vol. 123 (1-2). Gorsuch R. L. 1983), Factor analysis.

issues and outlook, Journal of Consumer Research 5: 103-123. Greenacre, M. J.,(1984), Theory and applications of correspondence analysis.

and Black W c. 1995), Multivariate data analysis with readings, fourth ed. Prentice hall, Englewood Cliffs, NJ. Hair J. F.,Black W c.,B. J.,Babin, Anderson R. E. and R. L.,Tatham (2006), Multivariate data analysis, sixth edition, Pearson

Prentice hall, Upper Saddle River NJ. Haq M. 1995), Reflections on Human Development, Oxford university Press, New york. Hartigan J. A. 1975), Clustering Algorithms, New york:

John Wiley & Sons, Inc. Hatcher L. 1994), A step-by-step approach to using the SAS system for factor analysis and structural equation modeling.

Cary, NC: SAS INSTITUTE. Focus on the CALIS procedure. Hattie J. 1985), Methodology Review: Assessing unidimensionality of tests and items, Applied Psychological Measurement, 9, 2: 139-164.

Heiser, W. J. 1993), Clustering in low-dimensional space. In: Opitz, O.,Lausen, B. and Klar, R.,Editors, 1993.

Information and Classification. Springer, Berlin, 162-173. Homma T. and Saltelli A. 1996), Importance measures in global sensitivity analysis of model output, Reliability Engineering and System Safety, 52 (1), 1-17.

Hutcheson G. and Sofroniou N. 1999), The multivariate social scientist: Introductory statistics using generalized linear models, Thousand Oaks, CA:

Karlsson J. 1998), A systematic approach for prioritizing software requirements, Phd. Dissertation n. 526, Linkoping, Sverige.

Kim, J.,Mueller, C. W. 1978), Factor analysis: Statistical methods and practical issues, Sage Publications, Beverly hills, California, pp. 88.

Covers confirmatory factor analysis using SEM techniques. See esp. Ch. 7. Knapp, T. R.,Swoyer, V. H. 1967), Some empirical results concerning the power of Bartlett's test of the significance of a correlation matrix.

Factor analysis as a statistical method, London: Butterworth and Co. 146 HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 Little R. J. A. and Schenker N. 1994), Missing Data, in Arminger

Little R. J. A (1997), Biostatistical Analysis with Missing Data, in Armitage P. and Colton T. eds.

Little R. J. A. and Rubin D. B. 2002), Statistical analysis with Missing Data, Wiley Interscience, J. Wiley & Sons, Hoboken, New jersey.

Mahlberg B. and Obersteiner M. 2001), Remeasuring the HDI by data Envelopment analysis, Interim report IR-01-069, International Institute for Applied System Analysis, Laxenburg, Austria.

Massart D. L. and Kaufman L. 1983), The Interpretation of Analytical Chemical Data by the Use of Cluster analysis, New york:

Milligan G. W. and Cooper M. C. 1985), An Examination of Procedures for Determining the Number of Clusters in a Data Set, Psychometrika, 50: 159-179.

Mcgraw-hill. OECD (1999), Employment Outlook, Paris. OECD (2003), Quality Framework and Guidelines for OECD Statistical Activities, www. oecd. org/statistics.

/60/34002216. pdf. OECD (2007), Data and Metadata Reporting and Presentation Handbook, available at http://www. oecd. org/dataoecd/46/17/37671574. pdf). Parker

Saisana M.,Nardo M. and Saltelli A. 2005b), Uncertainty and Sensitivity analysis of the 2005 Environmental Sustainability Index, in Esty D.,Levy M.,Srebotnjak T. and de Sherbinin

. and Tarantola S. 2008), Global Sensitivity analysis. The Primer, John Wiley & Sons. Saltelli A. 2007) Composite indicators between analysis and advocacy, Social Indicators Research, 81:65-77.

Saltelli A.,Tarantola S.,Campolongo F. and Ratto M. 2004), Sensitivity analysis in practice, a guide to assessing scientific models, New york:

Software for sensitivity analysis is available at http://www. jrc. ec. europa. eu/uasa/prj-sa-soft. asp.

Saltelli A. 2002), Making best use of model valuations to compute sensitivity indices, Computer Physics Communications, 145: 280-297.

11-30 Sobol'I. M. 1993), Sensitivity analysis for nonlinear mathematical models, Mathematical Modelling & Computational Experiment 1: 407-414.

Spath H. 1980), Cluster analysis Algorithms, Chichester, England: Ellis Horwood. Storrie D. and Bjurek H. 1999), Benchmarking European labour market performance with efficiency frontier technique, Discussion Paper FS I 00-2011.

Tarantola S.,Jesinghaus J. and Puolamaa M. 2000), Global sensitivity analysis: a quality assurance tool in environmental policy modelling.

Sensitivity analysis, pp. 385-397. New york: John Wiley & Sons. Tarantola S.,Saisana M.,Saltelli A.,Schmiedel F. and Leapman N. 2002), Statistical techniques and participatory approaches for the composition of the European Internal Market Index 1992

Transparency International's Corruption Index, http://www. transparency. org/cpi/2004/cpi2004. en. html#cpi2004 Trufte E. R. 2001), The Visual Display

METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 Vermunt J. K. and Magidson J. 2005), Factor analysis with categorical indicators:

In A. Van der Ark, M. A. Croon and K. Sijtsma (eds), New Developments in Categorical Data analysis for the Social and Behavioral Sciences, 41-62.

) Available at http://spitswww. uvt. nl/vermunt/vanderark2004. pdf Vichi M. and Kiers H. 2001), Factorial k-means analysis for two-way data, Computational Statistics

and Data analysis, 37 (1): 49-64. Vincke P. 1992), Multicriteria decision aid, Wiley, New york. Ward, J. H (1963), Hierarchical Grouping to optimize an objective function.

Watanabe S.,(1960), Information theoretical analysis of multivariate correlation, IBM Journal of Research and development 4: 66-82.

Decisions and evaluations by hierarchical aggregation of information, Fuzzy sets and Systems, 10: 243-260 HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

and hence have market value (1999) DIFFUSION OF RECENT INNOVATIONS INTERNET Internet hosts per 1 000 people Diffusion of the Internet,

which is indispensable to participation in the network age (2000) EXPORTS%Exports of high and medium technology products as a share of total goods exports (1999) DIFFUSION OF OLD INNOVATIONS TELEPHONES Telephone lines

INTERNET EXPORTS TELEPHONES (log) ELECTRICITY (log) SCHOOLING UNIVERSITY 1 Finland 187 125.6 200.2 50.7 3. 08 4. 15 10 27.4 2 United states

METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 153 PATENTS ROYALTIES INTERNET EXPORTS TELEPHONES (log) ELECTRICITY (log) SCHOOLING UNIVERSITY 42

Economists have long been hostile to subjective data. Caution is prudent, but hostility is warranted not.

Another common method (called imputing means within adjustment cells) is to classify the data for the individual indicator with some missing values in classes

and MSE is computed using all available regressors. 13 Other iterative methods include the Newton-Raphson algorithm

which, for complex patterns of incomplete data, can be complicated a very function of. As a result these algorithms often require algebraic manipulations and complex programming.

Numerical estimation of this matrix is also possible but careful computation is needed. 14 For NMAR mechanisms one needs to make assumptions on the missing-data mechanism

and to include them in the model (see Little & Rubin, 2002, Ch. 15). 15 The technique of PCA was described first by Karl Pearson in 1901.

A description of practical computing methods came much later from Hotelling in 1933. For a detailed discussion on the PCA, see Jolliffe (1986), Jackson (1991) and Manly (1994.

Eigenvalue computation in the 20th century, Journal of Computational and Applied mathematics, Vol. 123, Iss. 1-2. and Gentle, James E.;

as in most American cities it is not possible to go directly between two points. 20 The Bartlett test is valid under the assumption that data are a random sample from a multivariate normal distribution.

which is the portion of the variance of the first factor explained by the variable Internet. 25 To preserve comparability final weights could be rescaled to sum up to one. 26 DEA has also been used in production theory (for a review see Charnes et al.,

z Y C, see the note above. 35 Compensability of aggregations is studied widely in fuzzy set theory, for example Zimmermann & Zysno (1983) use the geometric operator

in the multiplicative aggregation it is proportional to the relative score of the indicator with respect to the others. 39 Data are normalized not.

whenever it does not change the ordinal information of the data matrix. 158 HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

send an email to Sourceoecd@oecd. org. ISBN 978-92-64-04345-9 30 2008 25 1 P:


< Back - Next >


Overtext Web Module V3.0 Alpha
Copyright Semantic-Knowledge, 1994-2011