(*)data_mining

Accuracy (138)
Application programming interface (143)
Artificial intelligence (117)
Artificial neural network (5)
Association rule (3)
Back propagation (1)
Cardinality (9)
Cart (2)
Cluster analysis (70)
Clustering (199)
Collinearity (6)
Conditional probability (3)
Coverage (1061)
Customer relationship management (22)
Data mining (37)
Decision trees (2)
Entropy (4)
Error rate (8)
Exploratory data analysis (1)
Factor analysis (112)
Front office (10)
Fuzzy logic (6)
Fuzzy set (4)
Fuzzy system (1)
Genetic algorithm (3)
Intelligent agent (6)
Knowledge discovery (9)
Nearest neighbor (1)
Occam s razor (1)
Overfitting (1)
Sensitivity analysis (101)
Structured query language (1)
Targeted-marketing (3)
Text mining (12)
Time-series forecasting (1)
Web mining (1)

11 1. 2. 2. Artificial intelligence...17 1. 3. Medicine and biotechnologies...23 1. 3. 1. Human spare parts and augmented human...

the founder of Machine learning as a Science. Foreword Innovation Biosphere is a very interesting title for a new book intended to raise thoughts beyond the ordinary.

For example, The french service in artificial intelligence was the best in the world in the early 1990s.

We wish to popularize the use of artificial intelligence approaches and techniques with the aim to conceive user-friendly and useful applications that can really help humans in their work instead of replacing them

Google also developed a machine-learning algorithm (artificial intelligence (AI)) that learns from operational data to model plant performance

and scalable devices (Lego-like devices) with the aim to reduce the environmental impact. 1. 2. 2. Artificial intelligence For many, AI means robots.

whereas artificial intelligence enables us to learn how to think about knowledge (problem-solving). In this frame of mind, Allen Newell NEW 82 has proposed a new way of modeling knowledge to make it comprehensible by computers:

Equipped with artificial intelligence techniques, computers can think, solve problems, become experts and accumulate a collective experience, under the condition that we transfer to them the relative knowledge and the necessary reasoning and learning techniques.

Artificial intelligence (AI) is pointed also out for destroying jobs robots are replacing humans FOR 13. According to Ford, advances in AI and robotics will have significant implications for evolving economic systems.

Machine learning, one of the primary techniques used in the development of IBM's Watson, is in essence a way to use statistical analysis of historical data to transform seemingly non-routine tasks into routine operations that can be computerized.

The artificial intelligence techniques may provide an efficient help without however switching-off the users'brain.

Computer vision, medical imaging and machine learning deal with highdimensional, noisy and heterogeneous datasets that are inherently non-Euclidean.

On the machine learning side it is called formalization (intension concept acquisition) or classification (extension class construction.

and learning in the future internet, in MERCIER-LAURENT E.,BOULANGER D. eds), Artificial intelligence for Knowledge management, Revised Selected Papers, Springer IFIP AICT 422, pp. 170 188,2012.

Could artificial intelligence create an unemployment crisis? Communications of the Association for Computing Machinery (ACM), vol. 56, no. 7, pp 37 39,2013, available at http://cacm. acm. org/.

NEW 82 NEWELL A.,The knowledge level, Artificial intelligence, vol. 18,1982. OEC 05a OECD, Definition of biotechnology, available at http://www. oecd. org/sti/biotech/statisticaldefinitionofbiotechnology. htm, 2005.

PIA 91 PIATESKI G.,FRAWLEY W.,Knowledge discovery in Databases, MIT Press, Cambridge, MA, 1991. PLA 14 PLANETOSCOPE, available at http://www. planetoscope. com/Avion/109-nombre-de-vols-d-avions-dans-le-monde. html, 2014.

service, 55,79, 99,101, 121,144, 152,153, 173 social, 12,43, 55,79, 101,120, 121,124, 138,152, 153 intangible benefits, 52,68, 78,173, 176 intelligence artificial intelligence

indicators, 176 knowledge discovery, 9 ecology, 54 economy 47,55, 70,73, 123 cultivators, 60,63, 64,137, 178 management, 46,67, 71,103, 139 processing, 177 knowledge-based systems, 99 Kohala Center, 128

M n o machine learning, 9, 11,20, 90,131, 140 market global, 5, 154 marketplace, 6 measuring benefits, 78 79 mind of plants, 165 167

We wish to popularize the use of artificial intelligence approaches and techniques in order to conceive friendly and useful applications that aid humans in their work instead of replacing them

Technically it is possible to realize a comprehensive network coverage that enables Internet access around the globe.

for example, meanwhile allow for real-time mining of business processes based on the digital traces that single process steps leave or based on text mining possibilities (Gu nther, Rinderle-Ma, Reichert, Van der

Another field of application is the implementation of an enterprise systems extension, like customer relationship management software or the establishment of a common integration platform along a supply chain.

the insurers can improve the pricing accuracy and sophistication, as well as attract favourable risks. As a result, the claims costs will be reduced,

Figure 2 also shows cardinality constraints. These are not part of the unconstrained class model. Later we will define constrained class models (Definition 4). However,

The cardinality constraints in Fig. 2 impose restrictions on object models. For example, a ticket corresponds to precisely one concert

Instead, we assume a given set VOM of valid object models satisfying all requirements (including cardinality constraints.

Relþ VOM satisfies all (cardinality) constraints including the following general requirements: for any ðr, mapk1, mapk2þ Rel there exist c1, c2, mapa1,

Moreover, all cardinality constraints are satisfied if OM VOM. Definition 4 abstracts from the concrete realization of object

Note that events may have varying cardinalities, e g.,, one event may create five objects of the same class.

Replaying history on process models for conformance checking and performance analysis. WIRES Data mining and Knowledge discovery, 2 (2), 182 192.

A two-step approach to balance between underfitting and overfitting. Software and Systems Modeling, 9 (1), 87 111.

In J. Balcazar (Ed.),ECML/PKDD 210 (Lecture Notes in Artificial intelligence, Vol. 6321, pp. 184 199.

Journal of Machine learning Research, 10,1305 1340. Gu nther, C, . & Aalst, W. van der (2006). A generic import framework for process event logs.

IEEE Symposium on Computational Intelligence and Data mining (CIDM 2013)( pp. 127 134. Singapore: IEEE. Montahari-Nezhad, H.,Saint-paul, R.,Casati, F,

IEEE Symposium on Computational Intelligence and Data mining (CIDM 2011)( pp. 184 191. Paris: IEEE. OMG.

Expert systems with Applications, 38 (6), 7029 7040. Rozinat, A, . & Aalst, W. van der (2008). Conformance checking of processes based on monitoring real behavior.

IEEE Symposium on Computational Intelligence and Data mining (CIDM 2011)( pp. 148 155. Paris: IEEE. Weijters, A,

Another look at measures of forecast accuracy. International Journal of Forecasting, 22 (4), 679 688. Leavy, B. 2005.

Association rule mining was applied then to extract rules to characterize deviant cases. It was found that a total of ten rules could explain almost all deviant cases.

The above case studies show that delta-analysis in combination with association rule and sequence mining particularly discriminative sequence mining provide a basis for discovering patterns of activities that distinguish negative deviance from normal cases.

and Shan (2005) present a business operations management platform equipped with time series forecasting functionalities. This platform allows for predictions of metric values on running process instances as well as for predictions of aggregated metric values of future instances (e g.,

and Pontieri (2012) propose a predictive clustering approach in which context-related execution scenarios are discovered

and van der Aalst (2013) for example present a technique to support process participants in making risk-informed decisions by traversing decision trees generated from the logs of past process executions.

In Proceedings of the IEEE symposium on computational intelligence and data mining (CIDM)( pp. 111 118.

Expert systems and Applications, 39 (5), 6061 6068. Lakshmanan, G. T.,Rozsnyai, S, . & Wang, F. 2013).

In Proceedings of the SIAM international conference on data mining (SDM)( pp. 644 655. SIAM. 154 M. Dumas and F. M. Maggi Identification of Business Process Models in a Digital World Peter Loos, Peter Fettke, Ju rgen Walter, Tom Thaler,

P. Loos(*)P. Fettke J. Walter T. Thaler P. Ardalani German Research center for Artificial intelligence (DFKI), Saarland University, Stuhlsatzenhausweg 3, 66123 Saarbru

Clustering: In a clustering step the different individual models are grouped in a way such that models within one group are similar

and models belonging to different groups are different. Here, typical techniques of cluster analysis or multivariate statistics can be used.

The modelsynset created in phase 3 can support the grouping. Known similarity measures for enterprise models can also be applied (Dijkman et al.

The cardinality describes the cardinal number of node sets which are being matched to each other. A sample of a node matching with both 1: 1 and M:

In a first step, clustering techniques are used to identify and reconstruct the given model groups. Since the model repository consists of 80 single models with 8 different processes and 10 variants each,

For reaching the full effects of this innovation it was seen also as important to include the development of an Application Protocol Interface (API) enabling external D2d-actors to include information from the management dashboard D2d in their internal governance structures.

Strategic coverage: which recommendation covers the largest number of goals in the company's strategy?

Expert systems with Applications, 37 (4), 3274 3283. Kaplan, R. S, . & Norton, D. P. 1992).

Using the API for ad hoc-deviations (Rinderle, 2004), the control process is integrated into the workflow instance.

Exploring features of a full-coverage integrated solution for business process compliance. In C. Salinesi & O. Pastor (Eds.

Social Customer relationship management (social CRM) is the ultimate key domain to illustrate how social media can affect new and existing business processes.

Due to the clustering of process instances this person can work fulltime as a Process Manager and the professionalism (i e.,

Peyman Ardalani has been doing his academical research as a Ph d. student since 2012 at the Institute for Information systems (IWI) at the German Research Institute for Artificial intelligence (DFKI.

a collaborative research center that gathers ten Estonian IT organizations with the aim of conducting industrydriven research in service engineering and data mining. From 2000 to 2007,

Currently, he is the deputy chair of the Institute for Information systems (IWI) at the German Research center for Artificial intelligence (DFKI), Saarbru cken.

Germany Peter Loos is Director of the Institute for Information systems (IWI) at the German Research center for Artificial intelligence (DFKI)

for Artificial intelligence (DFKI) and research project lead at Saarland University. His research activities include business process management, process mining, software development as well as implementation of information systems.

, 263 268,270 272,287 clustering, 161 162,170 collection, 170,216 corpus, 171 harmonization, 164 integration, 60,162 matching, 171 merge, 238,269 similarity, 170 Process

-operations Share of clusters with high media coverage Effect on business activities of SME Effect on R&d activities of SME Effect on international activities of SME Median value R&d

and media coverage (see Figure 10). Apparently, larger and matured clusters provide a much better environment for results and impacts as an effect of activities of a cluster management organization.

Relevance of size and age for the effect on cluster participants Share of clusters having initiated many successful co-operations Share of clusters with high media coverage Effect on R&d

strategic objectives in terms of numbers of clusters that are funded, restrictions on thematic areas and coverage of the most important business sectors.

Corresponding instruments should be developed by program owners to provide needbased support for cluster managements. 80 THE AUTHORS THOMAS LÄMMER-GAMP is Director of the European Secretariat for Cluster analysis (ESCA) at VDI/VDE

awards, lectures at conferences, press coverage, and other inexpensive means. 54 4 How SMES build new business models through open innovation?

but no representation or warranty, express or implied, is made to their accuracy, completeness or correctness.

Entrepreneurial Networks & Mentoring 49 6. Access to Markets 50 6. 1 First time Exporters 50 6. 2 Clustering Programme 51 6. 3 Public

and awareness events and are supported in the development of a market plan for their priority target market. 6. 2 Clustering Programme Enterprise Ireland's pilot clustering programme was established in 2012 to encourage groups of businesses to collaborate to achieve specific business objectives,

On one hand, expanding the number of Americans who have health insurance coverage may increase the market for medicines and treatments

Others established local production facilities to benefit from the gradually emerging clustering tendencies and from the availability of skilled workforce.

However, measures that promoted clustering and the improvement of clusters'services portfolio, were announced last in 2012 in WT.

Past policy instruments including clustering; supplier development (support to indigenous companies'investment in new technology to make them capable to become multinational subsidiaries'suppliers)

Although the region was among the first ones where bottom-up clustering tendencies were identified by regional economics researchers,

policy-makers consider it important to enhance regional clustering tendencies (intensify collaboration among stakeholders) and facilitate existing clusters'accreditation process.

and the activity it has carried out in the framework of SEE IDWOOD programme (Clustering, knowledge, innovation and design in SEE wood sector).

EI) 143 Build on Phase One of the Pilot Industry-led clustering initiative involving fifty companies by implementing the recommendations of the clustering Review carried out in 2014.

Improved efficiency and accuracy of internal business processes as a result of improved accuracy and consistency of databases across public and private sectors;

which will improve logistical efficiency, the accuracy of databases across both the public and private sector and planning and analysis capabilities in both sectors.

63 4. 1 Principal components analysis...63 4. 2 Factor analysis...69 4. 3 Cronbach Coefficient Alpha...

72 4. 4 Cluster analysis...73 4. 5 Other methods for multivariate analysis...79 6 HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

89 6. 1 Weights based on principal components analysis or factor analysis...89 6. 2 Data envelopment analysis (DEA...

115 Step 7. Uncertainty and sensitivity analysis...117 7. 1 General framework...118 7. 2 Uncertainty analysis (UA...

118 7. 3 Sensitivity analysis using variance-based techniques...121 7. 3. 1 Analysis 1...124 7. 3. 2 Analysis 2...129 Step 8. Back to the details...

K-means for clustering TAI countries...78 Table 14. Normalisation based on interval scales...83 Table 15.

-Young-Levenglick CLA Cluster analysis DEA Data Envelopment Analysis DFA Discriminant Function Analysis DQAF Data Quality Framework EC European commission EM Expected

Maximisation EU European union EW Equal weighting FA Factor analysis GCI Growth Competitiveness Index GDP Gross domestic product GME Geometric aggregation HDI Human Development Index ICT Information

Indicators should be selected on the basis of their analytical soundness, measurability, country coverage, relevance to the phenomenon being measured and relationship to each other.

and Sensitivity analysis should be undertaken to assess the robustness of the composite indicator in terms of, e g.,, the mechanism for including

process. 2. Data selection Should be based on the analytical soundness, measurability, country coverage, and relevance of the indicators to the phenomenon being measured and relationship to each other.

, principal components analysis, cluster analysis). ) To identify groups of indicators or groups of countries that are statistically similar

and sensitivity analysis Should be undertaken to assess the robustness of the composite indicator in terms of e g.,, the mechanism for including

To conduct sensitivity analysis of the inference (assumptions) and determine what sources of uncertainty are more influential in the scores

To correlate the composite indicator with other relevant measures, taking into consideration the results of sensitivity analysis.

To the extent that data permit, the accuracy of proxy measures should be checked through correlation and sensitivity analysis.

The quality and accuracy of composite indicators should evolve in parallel with improvements in data collection and indicator development.

Factor analysis (FA) is similar to PCA, but is based on a particular statistical model. An alternative way to investigate the degree of correlation among a set of variables is to use the Cronbach coefficient alpha (c-alpha),

Cluster analysis is another tool for classifying large amounts of information into manageable sets. It has been applied to a wide variety of research problems and fields, from medicine to psychiatry and archaeology.

Cluster analysis is used also in the development of composite indicators to group information on countries based on their similarity on different individual indicators.

Cluster analysis serves as:(i) a purely statistical method of aggregation of the indicators, ii) a diagnostic tool for exploring the impact of the methodological choices made during the construction phase of the composite indicator,

or when is believed it that some of them do not contribute to identifying the clustering structure in the data set,

and then apply a clustering algorithm on the object scores on the first few components,

as PCA or FA may identify dimensions that do not necessarily help to reveal the clustering structure in the data

and weaknesses of multivariate analysis Strengths Weaknesses Principal Components/Factor analysis Can summarise a set of individual indicators while preserving the maximum possible proportion of the total variation in the original data set.

Cluster analysis Offers a different way to group countries; gives some insight into the structure of the data set.

Various alternative methods combining cluster analysis and the search for a low-dimensional representation have been proposed, focusing on multidimensional scaling or unfolding analysis. Factorial k-means analysis combines k-means

cluster analysis with aspects of FA and PCA. A discrete clustering model together with a continuous factorial model are fitted simultaneously to two-way data to identify the best partition of the objects, described by the best orthogonal linear combinations of the variables (factors) according to the least-squares criterion.

This has a wide range of applications since it achieves a double objective: data reduction and synthesis, simultaneously in the direction of objects and variables.

PCA, FA, cluster analysis. Identified subgroups of indicators or groups of countries that are statistically similar.

A number of weighting techniques exist (Table 4). Some are derived from statistical models, such as factor analysis, data envelopment analysis and unobserved components models (UCM

Higher weights could be assigned to statistically reliable data with broad coverage. However, this method could be biased towards the readily available indicators,

and cons. Statistical models such as principal components analysis (PCA) or factor analysis (FA) could be used to group individual indicators according to their degree of correlation.

and sensitivity Sensitivity analysis can be used to assess the robustness of composite indicators Several judgements have to be made

A combination of uncertainty and sensitivity analysis can help gauge the robustness of the composite indicator

Sensitivity analysis assesses the contribution of the individual source of uncertainty to the output variance. While uncertainty analysis is used more often than sensitivity analysis

and is treated almost always separately, the iterative use of uncertainty and sensitivity analysis during the development of a composite indicator could improve its structure (Saisana et al.,

2005a; Tarantola et al. 2000; Gall, 2007. Ideally, all potential sources of uncertainty should be addressed: selection of individual indicators, data quality, normalisation, weighting, aggregation method, etc.

The sensitivity analysis results are shown generally in terms of the sensitivity measure for each input source of uncertainty.

The results of a sensitivity analysis are shown often also as scatter plots with the values of the composite indicator for a country on the vertical axis

Conducted sensitivity analysis of the inference, e g. to show what sources of uncertainty are more influential in determining the relative ranking of two entities.

Tested the links with variations of the composite indicator as determined through sensitivity analysis. Developed data-driven narratives on the results Documented

when quality was equated with accuracy. It is recognised now generally that there are other important dimensions. Even if data are accurate,

in the sense that both frameworks provide a comprehensive approach to quality, through coverage of governance, statistical processes and observable features of the outputs.

3. Accuracy and reliability: Are the source data, statistical techniques, etc. adequate to portray the reality to be captured?

2. Accuracy refers to the closeness of computations or estimates to the exact or true values;

It depends upon both the coverage of the required topics and the use of appropriate concepts.

Accuracy The accuracy of basic data is the degree to which they correctly estimate or describe the quantities

Accuracy refers to the closeness between the values provided and the (unknown) true values. Accuracy has many attributes,

and in practical terms it has no single aggregate or overall measure. Of necessity, these attributes are measured typically

In the case of sample survey-based estimates, the major sources of error include coverage, sampling, non-response,

from the fact that source data do not fully meet the requirements of the accounts in terms of coverage, timing,

An aspect of accuracy is the closeness of the initially released value (s) to the subsequent value (s) of estimates.

accuracy of basic data is extremely important. Here the issue of credibility of the source becomes crucial.

As individual basic data sources establish their optimal trade-off between accuracy and timeliness, taking into account institutional, organisational and resource constraints

and coverage should be identified so that the series can be reconciled. In the context of composite indicators, two aspects of coherence are especially important:

as well as the normalisation and the aggregation, can affect its accuracy, etc. In the following matrix, the most important links between each phase of the building process and quality dimensions are identified,

The imputation of missing data affects the accuracy of the composite indicator and its credibility.

The normalisation phase is crucial both for the accuracy and the coherence of final results.

The quality of basic data chosen to build the composite indicator strongly affects its accuracy and credibility.

The use of multivariate analysis to identify the data structure can increase both the accuracy and the interpretability of final results.

Almost all quality dimensions are affected by this choice, especially accuracy, coherence and interpretability. This is also one of the most criticised characteristics of composite indicators:

Analysis of this type can improve the accuracy, credibility and interpretability of the final results.

Table 5. Quality dimensions of composite indicators CONSTRUCTION QUALITY DIMENSIONS PHASE Relevance Accuracy Credibility Timeliness Accessibility Interpretability Coherence Theoretical framework Data

as well as the need to test the robustness of the composite indicator using uncertainty and sensitivity analysis.

It can be used for a broad range of problems, e g. variance component estimation or factor analysis.

===N i i i N i i i P O N MAE P O N RMSE 1 1/2 1 2 1 1()Finally a complementary measure of accuracy

and factor analysis, see Vermunt & Magidson 2005). 4. 1. Principal components analysis The objective is to explain the variance of the observed data through a few linear combinations of the original data. 15

Although social scientists may be attracted to factor analysis as a way of exploring data whose structure is unknown,

Principal component factor analysis (PFA), which is the most common variant of FA, is a linear procedure.

Note, however, that a variant of factor analysis, maximum likelihood factor analysis, does assume multivariate normality. The smaller the sample size, the more important it is to screen data for normality.

Moreover, as factor analysis is based on correlation (or sometimes covariance), both correlation and covariance will be attenuated when variables come from different underlying distributions (e g.,

, a normal vs. a bimodal variable will correlate less than 1. 0 even when both series are ordered perfectly co).

Factor analysis cannot create valid dimensions (factors) if none exist in the input data. In such cases, factors generated by the factor analysis algorithm will not be comprehensible.

Likewise, the inclusion of multiple definitionally-similar individual indicators representing essentially the same data will lead to tautological results.

but applying factor analysis to a correlation matrix with only low intercorrelations will require nearly as many factors as there are original variables,

thereby defeating the data reduction purposes of factor analysis. On the other hand, too high intercorrelations may indicate a multicollinearity problem

or otherwise eliminated prior to factor analysis. Notice also that PCA and Factor analysis (as well as Cronbach's alpha) assume uncorrelated measurement errors.

a) The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy is a statistic for comparing the magnitudes of the observed correlation coefficients to the magnitudes of the partial correlation coefficients.

if distinct factors are expected to emerge from factor analysis (Hutcheson & Sofroniou, 1999). A KMO statistic is computed for each individual indicator,

or higher to proceed with factor analysis (Kaiser & Rice, 1974), though realistically it should exceed 0. 80

but common cut off criterion for suggesting that there is a multi-collinearity problem. Some researchers use the more lenient cut off VIF value of 5. 0. c) The Bartlett's test of sphericity is used to test the null hypothesis that the individual indicators in a correlation matrix are uncorrelated,

and of how the interpretation of the components might be improved are addressed in the following section on Factor analysis. 4. 2. Factor analysis Factor analysis (FA) is similar to PCA.

Principal components factor analysis is preferred most in the development of composite indicators, e g. in the Product Market Regulation Index (Nicoletti et al.

This conclusion does not depend on the factor analysis method, as it has been confirmed by different methods (centroid method, principal axis method).

Note also that the factor analysis in the previous section had indicated university as the individual indicator that shared the least amount of common variance with the other individual indicators.

Although both factor analysis and the Cronbach coefficient alpha are based on correlations among individual indicators, their conceptual framework is different.

Cronbach coefficient alpha results for the 23 countries after deleting one individual indicator (standardised values) at a time. 4. 4. Cluster analysis Cluster analysis (CLA) is a collection of algorithms to classify objects such as countries, species,

if the classification has an increasing number of nested classes, e g. tree clustering; or nonhierarchical when the number of clusters is decided ex ante,

e g. k-means clustering. However, care should be taken that classes are meaningful and not arbitrary or artificial.

including Euclidean and non-Euclidean distances. 18 The next step is to choose the clustering algorithm,

METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 77 A nonhierarchical method of clustering,

is k-means clustering (Hartigan, 1975). This method is useful when the aim is to divide the sample into k clusters of the greatest possible distinction.

k-means clustering (standardised data. Table 13. K-means for clustering TAI countries Group1 (leaders) Group 2 (potential leaders) Group 3 (dynamic adopters) Finland Netherlands Sweden USA Australia

Canada New zealand Norway Austria Belgium Czech Rep. France Germany Hungary Ireland Israel Italy Japan Korea Singapore Slovenia Spain UK Finally, expectation

Various alternative methods combining cluster analysis and the search for a low-dimensional representation have been proposed and focus on multidimensional scaling or unfolding analysis (e g.

A discrete clustering model and a continuous factorial model are fitted simultaneously to two-way data with the aim of identifying the best partition of the objects, described by the best orthogonal linear combinations of the variables (factors) according to the least-squares criterion.

unlike factor analysis. This technique finds scores for the rows and columns on a small number of dimensions which account for the greatest proportion of the 2 for association between the rows and columns,

However, while conventional factor analysis determines which variables cluster together (parametric approach), correspondence analysis determines which category values are close together (nonparametric approach).

When the dependent variable has more than two categories then it is a case of multiple Discriminant analysis (or also Discriminant Factor analysis or Canonical Discriminant analysis), e g. to discriminate countries on the basis of employment patterns in nine industries (predictors.

This is the main difference to Cluster analysis, in which groups are predetermined not. There are also conceptual similarities with Principal Components and Factor analysis

but while PCA maximises the variance in all the variables HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS: METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 81 accounted for by a factor, DFA maximises the differences between values of the dependent.

AND AGGREGATION WEIGHTING METHODS 6. 1. Weights based on principal components analysis or factor analysis Principal components analysis,

and more specifically factor analysis, groups together individual indicators which are collinear to form a composite indicator that captures as much as possible of the information common to individual indicators.

For a factor analysis only a subset of principal components is retained (m i e. those that account for the largest amount of the variance.

Rotation is a standard step in factor analysis it changes the factor loadings and hence the interpretation of the factors,

when dealing with environmental issues. 6. 9. Performance of the different weighting methods The weights for the TAI example are calculated using different weighting methods equal weighting (EW), factor analysis (FA), budget allocation (BAP

The role of the variability in the weights and their influence on the value of the composite are discussed in the section on sensitivity analysis. 100 HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

TAI weights based on different methods Equal weighting (EW), factor analysis (FA), budget allocation (BAP), analytic hierarchy process (AHP) Method Weights for the indicators (fixed for all countries) Patents Royalties Internet Tech exports Telephones Electricity Schooling University EW 0

AND SENSITIVITY ANALYSIS Sensitivity analysis is considered a necessary requirement in econometric practice (Kennedy, 2003) and has been defined as the modeller's equivalent of orthopaedists'X-rays.

This is what sensitivity analysis does: it performs the‘X-rays'of the model by studying the relationship between information flowing in and out of the model.

More formally, sensitivity analysis is the study of how the variation in the output can be apportioned, qualitatively or quantitatively, to different sources of variation in the assumptions,

Sensitivity analysis is thus closely related to uncertainty analysis, which aims to quantify the overall uncertainty in country rankings as a result of the uncertainties in the model input.

A combination of uncertainty and sensitivity analysis can help to gauge the robustness of the composite indicator ranking

Below is described how to apply uncertainty and sensitivity analysis to composite indicators. Our synergistic use of uncertainty and sensitivity analysis has recently been applied for the robustness assessment of composite indicators (Saisana et al.

2005a; Saltelli et al. 2008) and has proven to be useful in dissipating some of the controversy surrounding composite indicators such as the Environmental Sustainability Index (Saisana et al.

and sensitivity analysis discussed below in relation to the TAI case study is only illustrative. In practice the setup of the analysis will depend upon which sources of uncertainty and

()c Rank CI is an output of the uncertainty/sensitivity analysis. The average shift in country rankings is explored also.

The investigation of()c Rank CI and S R is the scope of the uncertainty and sensitivity analysis. 41 7. 1. General framework The analysis is conducted as a single Monte carlo experiment,

A scatter plot based sensitivity analysis would be used to track which indicator affects the output the most

such as the variance and higher order moments, can be estimated with an arbitrary level of precision related to the size of the simulation N. 7. 3. Sensitivity analysis using variance-based techniques A necessary step

when designing a sensitivity analysis is to identify the output variables of interest. Ideally these should be relevant to the issue addressed by the model.

2008), with nonlinear models, robust, model-free techniques should be used for sensitivity analysis. Sensitivity analysis using variance-based techniques are model-free

and display additional properties convenient in the present analysis, such as the following: They allow an exploration of the whole range of variation of the input factors, instead of just sampling factors over a limited number of values, e g. in fractional factorial design (Box et al.

They allow for a sensitivity analysis whereby uncertain input factors are treated in groups instead of individually; They can be justified in terms of rigorous settings for sensitivity analysis.

To compute a variance-based sensitivity measure for a given input factor i X, start from the fractional contribution to the model output variance,

The I s, Ti S, in the case of nonindependent input factors, could also be interpreted as settings for sensitivity analysis. 124 HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

Figure 19 shows the sensitivity analysis based on the first-order indices. The total variance in each country's rank is presented

The sensitivity analysis results for the average shift in rank output variable (equation (38)) is shown in Table 40.

In the 1970s factor analysis and latent variables enriched path analysis giving rise to the field of Structural Equation Modelling (SEM, see Kline, 1998.

Measurement techniques such as factor analysis and item response theory are used to relate latent variables to the observed indicators (the measurement model),

the use of Bayesian networks is becoming increasingly common in bioinformatics, artificial intelligence and decision-support systems; 45 however, their theoretical complexity and the amount of computer power required to perform relatively simple graph searches make them difficult to implement in a convenient manner.

and the testing of the robustness of the composite using uncertainty and sensitivity analysis. The present work is perhaps timely,

Similarly, Nicoletti and others make use of factor analysis in the analysis of, for example, product market regulation in OECD countries (Nicoletti et al.,

The media coverage of events such as the publishing of the World Economic Forum's World Competitiveness Index and Environmental Sustainability Index

METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 141 REFERENCES Anderberg M. R. 1973), Cluster analysis for Applications, New york:

Binder D. A. 1978), Bayesian Cluster analysis, Biometrika, 65:31-38. Borda J. C. de (1784), Mémoire sur les élections au scrutin, in Histoire de l'Académie Royale des Sciences, Paris. Boscarino

. and Yarnold P. R. 1995), Principal components analysis and exploratory and confirmatory factor analysis. In Grimm and Yarnold, Reading and understanding multivariate analysis.

In Sensitivity analysis (eds, Saltelli A.,Chan K.,Scott M.)167-197. New york: John Wiley & Sons.

(2004b), Composite Indicator on e-business readiness, DG JRC, Brussels. Everitt B. S. 1979), Unresolved Problems in Cluster analysis, Biometrics, 35: 169-181.

in the 20th century, Journal of Computational and Applied mathematics, Vol. 123 (1-2). Gorsuch R. L. 1983), Factor analysis.

Haq M. 1995), Reflections on Human Development, Oxford university Press, New york. Hartigan J. A. 1975), Clustering Algorithms, New york:

John Wiley & Sons, Inc. Hatcher L. 1994), A step-by-step approach to using the SAS system for factor analysis and structural equation modeling.

Heiser, W. J. 1993), Clustering in low-dimensional space. In: Opitz, O.,Lausen, B. and Klar, R.,Editors, 1993.

Homma T. and Saltelli A. 1996), Importance measures in global sensitivity analysis of model output, Reliability Engineering and System Safety, 52 (1), 1-17.

Kim, J.,Mueller, C. W. 1978), Factor analysis: Statistical methods and practical issues, Sage Publications, Beverly hills, California, pp. 88.

Covers confirmatory factor analysis using SEM techniques. See esp. Ch. 7. Knapp, T. R.,Swoyer, V. H. 1967), Some empirical results concerning the power of Bartlett's test of the significance of a correlation matrix.

Factor analysis as a statistical method, London: Butterworth and Co. 146 HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

Massart D. L. and Kaufman L. 1983), The Interpretation of Analytical Chemical Data by the Use of Cluster analysis, New york:

Saisana M.,Nardo M. and Saltelli A. 2005b), Uncertainty and Sensitivity analysis of the 2005 Environmental Sustainability Index, in Esty D.,Levy M.,Srebotnjak T. and de Sherbinin

. and Tarantola S. 2008), Global Sensitivity analysis. The Primer, John Wiley & Sons. Saltelli A. 2007) Composite indicators between analysis and advocacy, Social Indicators Research, 81:65-77.

Saltelli A.,Tarantola S.,Campolongo F. and Ratto M. 2004), Sensitivity analysis in practice, a guide to assessing scientific models, New york:

Software for sensitivity analysis is available at http://www. jrc. ec. europa. eu/uasa/prj-sa-soft. asp.

11-30 Sobol'I. M. 1993), Sensitivity analysis for nonlinear mathematical models, Mathematical Modelling & Computational Experiment 1: 407-414.

Spath H. 1980), Cluster analysis Algorithms, Chichester, England: Ellis Horwood. Storrie D. and Bjurek H. 1999), Benchmarking European labour market performance with efficiency frontier technique, Discussion Paper FS I 00-2011.

Tarantola S.,Jesinghaus J. and Puolamaa M. 2000), Global sensitivity analysis: a quality assurance tool in environmental policy modelling.

Sensitivity analysis, pp. 385-397. New york: John Wiley & Sons. Tarantola S.,Saisana M.,Saltelli A.,Schmiedel F. and Leapman N. 2002), Statistical techniques and participatory approaches for the composition of the European Internal Market Index 1992

METHODOLOGY AND USER GUIDE ISBN 978-92-64-04345-9-OECD 2008 Vermunt J. K. and Magidson J. 2005), Factor analysis with categorical indicators:

Decisions and evaluations by hierarchical aggregation of information, Fuzzy sets and Systems, 10: 243-260 HANDBOOK ON CONSTRUCTING COMPOSITE INDICATORS:

z Y C, see the note above. 35 Compensability of aggregations is studied widely in fuzzy set theory, for example Zimmermann & Zysno (1983) use the geometric operator