ict

ES-Flipping to Digital Leadership 2015.pdf

Charles Nicholson, Ultimate Software (U s.;and Klas Bendrik, Volvo Cars (Sweden. Other Gartner colleagues: Frank Buytendijk, Carolyn Damon, Cameron Haight, Heather Keltz, Nick Jones, Kathy Kenny, Poh-Ling Lee, Talmor Margalit, Pierluigi Piva, Paul

but they must flip their information technology, value and people leadership practices to deliver on the digital promise.

The 2015 Gartner CIO Survey gathered data from 2, 810 CIO respondents in 84 countries and all major industries, representing approximately $12. 1 trillion in revenue/public-sector budgets

For this report, we analyzed this data and supplemented it with interviews of 11 CIOS

810 CIO respondents from 84 countries, representing $12. 1 trillion in revenue/budgets and $397 billion in IT spending Survey coverage based on 2,

social and big data are already central to business thinking, and the next set of digital technologies, trends, opportunities and threats is creating yet another competitive frontier.

However, existing business processes, business models, information technology and talent suffer from legacy inertia and bad complexity.

and capabilities Digital business success requires starting with a digital information and technology mindset, and working backward Measurement is short-term and input-centric,

From legacy first to digital first Beyond simplification, cloud and mobile are now valuable options, if not necessities.

businesses need forward-looking predictive analytics combined with data-led experimentation (see figure below). Information and technology flip 3:

From the nexus to the next horizon Cloud, mobile, social and information the Nexus of Forces are no longer exotic options;

From Backward-looking reporting Passive analysis of data Structured information Separate analytics To Forward-looking predictive analytics Active experimentation informed by data New types of information,

From aligning with corporate culture to building a digital culture A traditional, risk-averse corporate culture that views IT only as an infrastructural enabler of transactions will devour even the most innovative digital business strategy like a small snack!

The CIO's Front-office Toolkit 2013 No. 6 Succeeding in Tomorrow's Technology Labor market 2013 No. 5 Turbocharging the CIO With Software Tools 2013 No. 4 The Psychology of Serial Innovation

Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Gartner shall have no liability for errors,

IT) is the world's leading information technology research and advisory company. We deliver the technology-related insight necessary for our clients to make the right decisions, every day.

From CIOS and senior IT leaders in corporations and government agencies, to business leaders in high-tech and telecom enterprises and professional services firms, to technology investors, we are the valuable partner to clients in 9, 000 distinct

EUR 21682 EN.pdf

which might be made of the following information A great deal of information on the European union is available on the Internet.

It can be accessed through the Europa server (http//europa. eu. int). The Report is available online at http://farmweb. jrc. cec. eu. int/ci/bibliography. htm EUR 21682 EN European communities,

. 1. 1 Principal Components Analysis 17 3. 1. 2 Factor analysis 21 3. 1. 3 Cronbach Coefficient Alpha 26 3. 2

Grouping information on countries 28 3. 2. 1 Cluster analysis 28 3. 2. 2 Factorial k-means analysis 34 3. 3 Conclusions

IMPUTATION OF MISSING DATA 35 4. 1 Single imputation 36 3. 1. 1 Unconditional mean imputation 37 4. 1. 2 Regression

imputation 38 4. 1. 3 Expected maximization imputation 38 4. 2 Multiple imputation 40 5. NORMALISATION OF DATA 44 5. 1

and factor analysis 56 6. 1. 2 Data envelopment analysis and Benefit of the doubt 59 Benefit of the doubt approach 60 6. 1. 3 Regression approach

81 7. UNCERTAINTY AND SENSITIVITY ANALYSIS 85 7. 1 Set up of the analysis 87 7. 1. 1 Output variables of interest 87 7. 1. 2

General framework for the analysis 88 4 7. 1. 3 Inclusion exclusion of individual sub-indicators 88 7. 1. 4 Data quality 88

7. 1. 5 Normalisation 88 7. 1. 6 Uncertainty analysis 89 7. 1. 7 Sensitivity analysis using variance-based techniques 91 7

We deal with the problem of missing data and with the techniques used to bring into a common unit the indicators that are of very different nature.

and aggregating indicators into a composite and test the robustness of the composite using uncertainty and sensitivity analysis.

whereby a lot of work in data collection and editing is wasted or hidden behind a single number of dubious significance.

or imprecise assessment and use uncertainty and sensitivity analysis to gain useful insights during the process of composite indicators building, including a contribution to the indicators'quality definition and an appraisal of the reliability of countries'ranking.

multivariate analysis, imputation of missing data and normalization techniques aim at supplying a sound and defensible dataset.

and sensitivity analysis to increase transparency and make policy inference more defensible. Section 8 shows how different visualization strategies of the same composite indicator can convey different policy messages.

Factor analysis and Reliability/Item Analysis (e g. Coefficient Cronbach Alpha) can be used to group the information on the indicators.

Cluster analysis can be applied to group the information on constituencies (e g. countries) in terms of their similarity with respect to the different sub-indicators.

(d) a method for selecting groups of countries to impute missing data with a view to decrease the variance of the imputed values.

Clearly the ability of a composite to represent multidimensional concepts largely depends on the quality and accuracy of its components.

Missing data are present in almost all composite indicators and they can be missing either in a random or in a nonrandom fashion.

whether data are missing at random or systematically, whilst most of the methods of imputation require a missing at random mechanism.

Three generic approaches for dealing with missing data can be distinguished i e. case deletion, single imputation or multiple imputation.

The other two approaches see the missing data as part of the analysis and therefore try to impute values through either Single Imputation (e g.

Markov Chain Monte carlo algorithm. The advantages of imputation include the minimisation of bias and the use of‘expensive to collect'data that would

otherwise be discarded. In the words of Dempster and Rubin (1983: The idea of imputation is both seductive and dangerous.

because it can lull the user into the pleasurable state of believing that the data are complete after all,

and imputed data have substantial bias. Whenever indicators in a dataset are incommensurate with each other,

The normalization method should take into account the data properties and the objectives of the composite indicator.

whether hard or soft data are available, whether exceptional behaviour needs to be rewarded/penalised, whether information on absolute levels matters,

partially, to correct for data quality problems in such extreme cases. The functional transformation is applied to the raw data to represent the significance of marginal changes in its level.

Different weights may be assigned to indicators to reflect their economic significance (collection costs, coverage, reliability and economic reason), statistical adequacy, cyclical conformity, speed of available data, etc.

such as 12 weighting schemes based on statistical models (e g. factor analysis, data envelopment analysis, unobserved components models), or on participatory methods (e g. budget allocation, analytic hierarchy processes).

Weights may also reflect the statistical quality of the data, thus higher weight could be assigned to statistically reliable data (data with low percentages of missing values, large coverage, sound values).

In this case the concern is to reward only sub-indicators easy to measure and readily available, punishing the information that is more problematic to identify and measure.

Uncertainty analysis and sensitivity analysis is a powerful combination of techniques to gain useful insights during the process of composite indicators building,

selection of data, data quality, data editing (e g. imputation), data normalisation, weighting scheme/weights, weights'values and aggregation method.

A combination of uncertainty and sensitivity analysis can help to gauge the robustness of the composite indicator

Sensitivity analysis (SA) studies how much each individual source of uncertainty contributes to the output variance. In the field of building composite indicators, UA is adopted more often than SA (Jamison and Sandbu, 2001;

The composite indicator is no longer a magic number corresponding to crisp data treatment, weighting set or aggregation method,

The iterative use of uncertainty and sensitivity analysis during the development of a composite indicator can contribute to its well-structuring

if composite indicators could be made available via the web, along with the data, the weights and the documentation of the methodology.

Given that composite indicators can be decomposed or disaggregated so as to introduce alternative data, weighting, normalisation approaches etc.

the components of composites should be available electronically as to allow users to change variables, weights,

etc. and to replicate sensitivity tests. 2. 1 Requirements for quality control As mentioned above the concept of quality of the composite indicators is not only a function of the quality of its underlying data (in terms of relevance, accuracy, credibility, etc.)

) Table 2. 1 The Pedigree Matrix for Statistical Information Grade Definitions & Standards Data-collection & Analysis Institutional Culture Review 4 Negotiation Task-force Dialogue

Factor analysis and Reliability/Item Analysis can be used complementarily to explore whether the different dimensions of the phenomenon are balanced well-from a statistical viewpoint-in the composite indicator.

The use of cluster analysis to group countries in terms of similarity between different sub-indicators can serve as:(

(h) a method for selecting groups of countries to impute missing data with a view to decrease the variance of the imputed values.

Cluster analysis could, thereafter, be useful in different sections of this document. The notation that we will adopt throughout this document is the following. tq

A description of practical computing methods came much later from Hotelling in 1933. The objective of the analysis is to take Q variables 1 2 Q x, x,

say P<Q principal components that preserve a high amount of the cumulative variance of the original data.

because it means that the principal components are measuring different statistical dimensions in the data.

When the objective of the analysis is to present a huge data set using a few variables then in applying PCA there is the hope that some degree of economy can be achieved

the highest correlation is found between the sub-indicators ELECTRICITY & INTERNET with a coefficient of 0. 84.

PATENTS ROYALTIES INTERNET EXPORTS TELEPHONES ELECTRICITY SCHOOLING ENROLMENT PATENTS 1. 00 0. 13-0. 09 0. 45 0. 28 0

INTERNET 1. 00-0. 45 0. 56 0. 84 0. 63 0. 27 EXPORTS 1. 00 0. 00-0. 36

-0. 35-0. 03 TELEPHONES 1. 00 0. 64 0. 30 0. 33 ELECTRICITY 1. 00 0. 65 0. 26

Bootstrap refers to the process of randomly resampling the original data set to generate new data sets.

but the computation may be cumbersome. Various values have been suggested, ranging from 25 (Efron and Tibshirani, 1991) to as high as 1000 (Efron, 1987;

whether the TAI data set for the 23 countries can be viewed as a‘random'sample of the entire population as required by the bootstrap procedures (Efron 1987;

Several points can be made regarding the issues of randomness and representativeness of the data. First, it is often difficult to obtain complete information for a data set in the social sciences because, unlike the natural sciences,

controlled experiments are not always possible, as in the case here. As Efron and Tibshirani (1993) state:‘

A 20 third point on the data quality is that a certain amount of measurement error is likely to exist.

While such measurement error can only be controlled at the data collection stage rather than at the analytical stage, it is argued that the data represent the best estimates currently available (United nations, 2001, p. 46.

Figure 3. 1 (right) demonstrates graphically the relationship between the eigenvalues from the deterministic PCA,

. 27-0. 17-0. 04 0. 10 INTERNET-0. 92 0. 21 0. 02-0. 10 0. 04 0. 11

. 08 TELEPHONES-0. 76-0. 39-0. 16-0. 16-0. 41-0. 16 0. 16-0. 09 ELECTRICITY-0

and how the interpretation of the components might be improved are addressed without further ado in the following section on Factor analysis. 3. 1. 2 Factor analysis Factor analysis (FA) has similar aims to PCA.

Principal components factor analysis is preferred most in the development of composite indicators (see Section 6), e g.

The first factor has high positive coefficients (loadings) with INTERNET (0. 79), ELECTRICITY (0. 82) and SCHOOLING (0. 88).

Factor 4 is formed by ROYALTIES and TELEPHONES. Yet, despite the rotation of factors, the sub-indicator of EXPORTS has 23 sizeable loadings in both Factor 1 (negative loading) and Factor 2 (positive loading.

0. 07-0. 07 0. 93 INTERNET 0. 79-0. 21 0. 21 0. 42 EXPORTS-0. 64 0. 56

-0. 04 0. 36 TELEPHONES 0. 37 0. 17 0. 38 0. 68 ELECTRICITY 0. 82-0. 04 0. 25

0. 14 0. 09 0. 18 INTERNET 0. 31 0. 56-0. 29 0. 60 EXPORTS 0. 29-0. 45

0. 58-0. 14 TELEPHONES 0. 41 0. 13 0. 18 0. 73 ELECTRICITY 0. 13 0. 57-0. 13

it is unlikely that they share common factors. 2. Identify the number of factors that are necessary to represent the data

Assumptions in Principal Components Analysis and Factor analysis 1. Enough number of cases. The question of how many cases (or countries) are necessary to do PCA/FA has no scientific answer

Although social scientists may be attracted to factor analysis as a way of exploring data whose structure is unknown,

which variables are associated most with the outlier cases. 4. Assumption of interval data. Kim and Mueller (1978b

pp. 74-75) note that ordinal data may be used if it is thought that the assignment of ordinal categories to the data does not seriously 25 distort the underlying metric scaling.

Likewise, these authors allow the use of dichotomous data if the underlying metric correlations between the variables are thought to be moderate(.

7) or lower. The result of using ordinal data is that the factors may be much harder to interpret.

Note that categorical variables with similar splits will necessarily tend to correlate with each other, regardless of their content (see Gorsuch, 1983).

Principal components factor analysis (PFA), which is the most common variant of FA, is a linear procedure.

the more important it is to screen data for linearity. 6. Multivariate normality of data is required for related significance tests.

Note, however, that a variant of factor analysis, maximum likelihood factor analysis, does assume multivariate normality. The smaller the sample size, the more important it is to screen data for normality.

Moreover, as factor analysis is based on correlation (or sometimes covariance), both correlation and covariance will be attenuated when variables come from different underlying distributions (ex.,

a normal vs. a bimodal variable will correlate less than 1. 0 even when both series are ordered perfectly co).

7. Underlying dimensions shared by clusters of sub-indicators are assumed. If this assumption is met not, the"garbage in,

Factor analysis cannot create valid dimensions (factors) if none exist in the input data. In such cases, factors generated by the factor analysis algorithm will not be comprehensible.

Likewise, the inclusion of multiple definitionally-similar sub-indicators representing essentially the same data will lead to tautological results. 8. Strong intercorrelations are required not mathematically,

but applying factor analysis to a correlation matrix with only low intercorrelations will require for solution nearly as many factors as there are original variables,

thereby defeating the data reduction purposes of factor analysis. On the other hand, too high inter-correlations may indicate a multi-collinearity problem

and collinear terms should be combined or otherwise eliminated prior to factor analysis. (a) The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy is a statistics for comparing the magnitudes of the observed correlation coefficients to the magnitudes of the partial correlation coefficients.

The concept is that the partial correlations should not be very large if one is to expect distinct factors to emerge from factor analysis (see Hutcheson and Sofroniou, 1999, p. 224).

A KMO statistic is computed for each individual sub-indicator, and their sum is the KMO overall statistic.

KMO varies from 0 to 1. 0. A KMO overall should be. 60 or higher to proceed with factor analysis (Kaiser and Rice, 1974),

though realistically it should exceed 0. 80 if the results of the principal components analysis are to be reliable.

If not, it is recommended to drop the sub-indicators with the lowest individual KMO statistic values,

but common cut off criterion for suggesting that there is a multi 26 collinearity problem. Some researchers use the more lenient cutoff VIF value of 5. 0. c) The Bartlett's test of sphericity is used to test the null hypothesis that the sub-indicators in a correlation matrix are uncorrelated,

Sensitive to modifications in the basic data: data revisions and updates (e g. new countries. Sensitive to the presence of outliers,

which may introduce a spurious variability in the data. Sensitive to small-sample problems, which are particularly relevant

when the focus is limited on a set of countries. Minimisation of the contribution of subindicators which do not move with other subindicators.

TELEPHONES has the highest variable-total correlation and if deleted the coefficient alpha would be as low as 0. 60.

Note also, that the factor analysis in the previous section had indicated ENROLMENT as the sub-indicator that shares the least amount of common variance with the other sub-indicators.

Although both factor analysis and the Cronbach coefficient alpha are based on correlations among sub-indicators, their conceptual framework is different. 28 Table 3. 6. Cronbach coefficient alpha results for the 23 countries after deleting one subindicator (standardised values) at-a-time Deleted sub-indicator

Correlation with total Cronbach coefficient alpha PATENTS 0. 261 0. 704 ROYALTIES 0. 527 0. 645 INTERNET 0. 566 0. 636

EXPORTS-0. 108 0. 774 TELEPHONES 0. 701 0. 603 ELECTRICITY 0. 614 0. 624 SCHOOLING 0. 451 0. 662

2004) Success of software process implementation 3. 2 Grouping information on countries 3. 2. 1 Cluster analysis Cluster analysis (CLA) is given the name to a collection of algorithms used to classify objects

Cluster analysis has been applied in a wide variety of research problems, from medicine and psychiatry to archeology.

cluster analysis is of great utility. 29 CLA techniques can be hierarchical (for example the tree clustering),

or nonhierarchical when the number of clusters is decided ex ante (for example the k-means clustering).

To do so the clustering techniques attempt to have more in common with own group than with other groups, through minimization of internal variation while maximizing variation between groups.

(=Useful if the data are categorical in nature. Having decided how to measure similarity (the distance measure),

the next step is to choose the clustering algorithm, i e. the rules which govern how distances are measured between clusters.

and hence different classifications may be obtained for the same data, even using the same distance measure.

Single linkage (nearest neighbor. The distance between two clusters is determined by the distance between the two closest elements in the different clusters.

which indicates that the data are represented best by ten clusters: Finland alone, Sweden and 31 USA, the group of countries located between The netherlands and Hungary, then alone Canada, Singapore, Australia, New zealand, Korea, Norway, Japan.

Figure 3. 2. Country clusters for the sub-indicators of technology achievement (standardised data. Type:

3 6 9 12 15 18 21 Figure 3. 3. Linkage distance versus fusion step in the hierarchical clustering for the technology achievement example.

A nonhierarchical method of clustering, different from the Joining (or Tree) clustering shown above, is the k-means clustering (Hartigan,

1975). ) This method is useful when the aim is that of dividing the sample in k clusters of greatest possible distinction.

Thus, this algorithm can be applied with continuous variables (yet it can be modified also to accommodate for other types of variables.

The algorithm starts with k random clusters and moves the objects in and out the clusters with the aim of (i) minimizing the variance of elements within the clusters,

the dynamic adopters are lagging behind the potential leaders due to their lower performance on INTERNET, ELECTRICITY and SCHOOLING.

Table 3. 8. K-means clustering for the 23 countries in the technology achievement case study Group1 (leaders) Group 2 (potential leaders) Group 3 (dynamic adopters

1. 5 2. 0 PATENTS RECEIPTES INTERNET EXPORTS TELEPHONES ELECTRICITY SCHOOLING ENROLMENT Group 3 Group 2 Group 1 Figure 3

kmeans clustering (standardized data. Finally, expectation maximization (EM) clustering extends the simple k-means clustering in two ways:

so as to maximizes the overall likelihood of the data, given the final clusters (Binder, 1981). 2. Unlike k-means,

EM can be applied both to continuous and categorical data. Ordinary significance tests are not valid for testing differences between clusters.

Principal component analysis or Factor analysis) that summarize the common information in the data set by detecting non-observable dimensions.

On the other hand, the relationships within a set of objects (e g. countries) are explored often by fitting discrete classification models as partitions, n-trees, hierarchies, via nonparametric techniques of clustering.

or when is believed it that some of these do not contribute much to identify the clustering structure in the data set,

frequently carrying out a PCA and then applying a clustering algorithm on the object scores on the first few components.

because PCA or FA may identify dimensions that do not necessarily contribute much to perceive the clustering structure in the data and that,

Various alternative methods combining cluster analysis and the search for a low-dimensional representation have been proposed, and focus on multidimensional scaling or unfolding analysis (e g.,

A method that combines k-means cluster analysis with aspects of Factor analysis and PCA is presented by Vichi and Kiers (2001.

A discrete clustering model together with a continuous factorial one are fitted simultaneously to two-way data,

the data reduction and synthesis, simultaneously in direction of objects and variables; Originally applied to short-term macroeconomic data,

factorial k-means analysis has a fast alternating least-squares algorithm that extends its application to large data sets.

The methodology can therefore be recommended as an alternative to the widely used tandem analysis. 3. 3 Conclusions Application of multivariate statistics,

including Factor analysis, Coefficient Cronbach Alpha, Cluster analysis, is something of an art, and it is certainly not as objective as most statistical methods.

Available software packages (e g. STATISTICA, SAS, SPSS) allow for different variations of these techniques. The different variations of each technique can be expected to give somewhat different results

and can therefore confuse the developers of composite indicators. On the other hand multivariate statistic is used widely to analyse the information inherent in a set of sub-indicators

then it must take its place as one of the important steps during the development of composite indicators. 35 4. Imputation of missing data Missing data are present in almost all the case studies of composite indicators.

Data can be missing either in a random or in a nonrandom fashion. They can be missing at random because of malfunctioning equipment, weather issues, lack of personnel,

but there is no particular reason to consider that the collected data are substantially different from the data that could not be collected.

On the other hand, data are often missing in a nonrandom fashion. For example, if studying school performance as a function of social interactions in the home, it is reasonable to expect that data from students in particularly types of home environments would be more likely to be missing than data from people in other types of environments.

More formally, the missing patterns could be: -MCAR (Missing Completely At random: missing values do not depend on the variable of interest or any other observed variable in the data set.

For example the missing values in variable income would be of MCAR type if (i) people who do not report their income have on average,

but they are conditional on some other variables in the data set. For example the missing values in income would be MAR

if the probability of missing data on income depends on marital status but, within each category of marital status,

One of the problems with missing data is that there is no statistical test for NMAR and often no basis upon

whether data are missing at random or systematically, whilst most of the methods that impute (i e. fill in) missing values require an MCAR or at least an MAR mechanism.

Three generic approaches for dealing with missing data can be distinguished, i e. case deletion, single imputation or multiple imputation.

The other two approaches see the missing data as part of the analysis and therefore try to impute values through either Single Imputation (e g.

Markov Chain Monte carlo algorithm. The advantages of imputation include the minimisation of bias 36 and the use of‘expensive to collect'data that would

otherwise be discarded. The main disadvantage of imputation is that it can allow data to influence the type of imputation.

In the words of Dempster and Rubin (1983: The idea of imputation is both seductive and dangerous.

because it can lull the user into the pleasurable state of believing that the data are complete after all,

and imputed data have substantial bias. The uncertainty in the imputed data should be reflected by variance estimates.

This allows taking into account the effects of imputation in the course of the analysis. However

The literature on the analysis of missing data is extensive and in rapid development. Therefore

The predictive distribution must be created by employing the observed data. There are, in general, two approaches to generate this predictive distribution:

the focus is on an algorithm, with implicit underlying assumptions that should be assessed. Besides the need to carefully verify

the danger of this type of modeling missing data is to consider the resulting data set as complete

fill in blanks cells with individual data drawn from similar responding units, e g. missing values for individual income may be replaced with the income of another respondent with similar characteristics (age, sex, race, place of residence, family relationships, job, etc.).

and the time to converge depends on the proportion of missing data and the flatness of the likelihood function.

an important limitation of the single imputation methods is that they systematically underestimate the variance of the estimates (with some exceptions for the EM method where the bias depends on the algorithm used to estimate the variance).

Another common method (called imputing means within adjustment cells) is to classify the data for the sub-indicator with some missing values in classes

thus the inference based on the entire dataset (including the imputed data) does not fully count for imputation uncertainty.

For nominal variables, frequency statistics such as the mode or hot-and cold-deck imputation methods might be more appropriate. 4. 1. 3 Expected maximization imputation Suppose that X denotes the data.

In the likelihood based estimation the data are assumed to be generated by a model described by a probability

The probability function captures the relationship between the data set and the parameter of the of the data model 5

If the observed variables are dummies for a categorical variable then the prediction (4. 2) are respondent means within classes defined by the variable

while the data set is known, it make sense to reverse the argument and look for the probability of observing a certain given the data set X:

this is the likelihood function. Therefore, given X, the likelihood function L(/X) is any function of O proportional to f (X/:

The EM algorithm is one of these iterative methods. 7 The issue is that X contains both observable and missing values, i e.

Assume that missing data are MAR or MCAR8, the EM consists of two components, the expectation (E) and maximization (M) steps.

Each step is completed once within each algorithm cycle. Cycles are repeated until a suitable convergence criterion is satisfied.

just as if there were no missing data (thus missing values are replaced by estimated values, i e. initial conditions in the first round of maximization).

In the E step the missing data are estimated by their expectations given the observed data and current estimated parameter values.

In the following maximization step the 7 Other iterative methods include the Newton-Raphson algorithm

which, for complex pattern of incomplete data, can be a very complicate function of. As a result these algorithms often require algebraic manipulations and complex programming.

Numerical estimation of this matrix is also possible but careful computation is needed. 8 For NMAR mechanisms one needs to make assumption on the missing-data mechanism

and include them into the model, see Little and Rubin, 2002, Ch. 15.40 parameters in are estimated re using maximum likelihood applied to the observed data augmented by the estimates of the unobserved data (coming from the previous round).

The whole procedure is iterated until convergence (absence of changes in estimates and in the variancecovariance matrix.

Effectively, this process maximizes, in each cycle, the expectation of the complete data log likelihood.

The advantage of the EM is its broadness (it can be used for a broad range of problems, e g. variance component estimation or factor analysis),

its simplicity (EM algorithm are often easy to construct conceptually and practically), and each step has a statistical interpretation

To test this, different initial starting values for each can be used. 4. 2 Multiple imputation Multiple imputation (MI) is a general approach that does not require a specification of paramentrized likelihood for all data.

The idea of MI is depicted in Figure 4. 1. The imputation of missing data is performed with a random process that reflects uncertainty.

Data set with missing values Set 1 Set 2 Set N Result 1 Result 2 Result N Combine results Figure 4

It assumes that data are drawn from a multivariate Normal distribution and requires MAR or MCAR assumptions.

The theory of MCMC is understood most easily using Bayesian methodology (See Figure 4. 2). Let us denote the observed data as Xobs and the complete dataset as X=(Xobs

we shall estimate it from the data, yielding, and use the distribution f (Xmis Xobs).

and covariance matrix from the data that does not have missing values. Use to estimate Prior distribution.

simulate values for missing data items by randomly selecting a value from the available distribution of values Posterior step:

i e. mean vector and cov. matrix are unchanged as we iterate) Use imputation from final iteration to form a data set without missing values need more iterations enough iterations Figure 4. 2. Functioning of MCMC

whose the distribution depends on the data. So the first step for its estimation is to obtain the posterior distribution of from the data.

Usually this posterior is approximated by a normal distribution. After formulating the posterior distribution of, the following imputation algorithm can be used:

Draw*from the posterior distribution of, f (Y, Xobs) where Y denotes exogenous variables that may influence.

*9 The missing data generating process may depend on additional parameters f, but if f and are independent,

the process called ignorable and the analyst can concentrate on modelling the missing data, given the observed data and.

then we have non-ignorable missing data generating process, which cannot be solved adequately without making assumptions on the functional form of the interdependency. 10 rearranged from K. Chantala and C. Suchindran,

http://www. cpc. unc. edu/services/computer/presentations/mi presentation2. pdf 42 Use the completed data X and the model to estimate the parameter of interest (e g. the mean) ß

In conclusion, Multiple Imputation method imputes several values (N) for each missing value (from the predictive distribution of the missing data),

The N versions of completed data sets are analyzed by standard complete data methods and the results are combined using simple rules to yield single combined estimates (e g.,

p-values, that formally incorporate missing data uncertainty. The pooling of the results of the analyses performed on the multiply imputed data sets,

implies that the resulting point estimates are averaged over the N completed sample points, and the resulting standard errors and p-values are adjusted according to the variance of the corresponding N completed sample point estimates.

Thus, the'between imputation variance',43 provides a measure of the extra inferential uncertainty due to missing data

44 5. Normalisation of data The indicators selected for aggregation convey at this stage quantitative information of different kinds11.

and their robustness to possible outliers in the data. Different normalization methods will supply different results for the composite indicator.

'46 Another transformation which is used often to reduce the skewness of (positive) data varying across many orders of magnitudes is the logarithmic transformation:

yet s/he has to beware that the normalized data will surely be affected by the log transformation.

Therefore, data have to be processed via specific treatment. An example is offered in the Environmental Sustainability Index

where the variable distributions outside the 2. 5 and 97.5 percentile scores are trimmed to partially correct for outliers as well as to avoid having extreme values overly dominate the aggregation algorithm.

when data for a new time point become available. This implies an adjustment of the analysis period T,

In such cases, to maintain comparability between the existing and the new data, the composite indicator would have to be recalculated for the existing data. 5. 2. 4 Distance to a reference country This method takes the ratios of the indicator tqc x for a generic country c and time t

with respect to the sub-indicator t0 qc c x=for the reference country at the initial time 0 t. t0 qc c tt qc qc xx

if there is little variation within the original scores, the percentile banding forces the categorization on the data, irrespective of the distribution of the underlying data.

2003), an OECD report describing the construction of summary indicators from a large OECD database of economic and administrative product market regulations and employment protection legislation.

The summary indicators are obtained by means of factor analysis, in which each component of the regulatory framework is weighted according to its contribution to the overall variance in the data.

Data have been gathered basically from Member countries responses to the OECD Regulatory Indicators Questionnaire, which include both qualitative and quantitative information.

Qualitative information is coded by assigning a numerical value to each of its possible modalities (e g. ranging from a negative to an affirmative answer)

while the quantitative information (such as data on ownership shares or notice periods for individual dismissals) is subdivided into classes.

Examples of the above transformations are shown in Table 5. 6 using the TAI data. The data are sensitive to the choice of the transformation

and this might cause problems in terms of loss of the 52 interval level of the information, sensitivity to outliers, arbitrary choice of categorical scores and sensitivity to weighting.

normalisation techniques using the TAI data. Mean years of school Rank zscore rescaling distance to reference country Log 10 abo ve/Per cen tile Cat ego rica l age 15

coverage, reliability and economic reason), statistical adequacy, cyclical conformity, speed of available data, etc. In this section a number of techniques are presented ranging from weighting schemes based on statistical models (such as factor analysis, data envelopment analysis, unobserved components models),

to participatory methods (e g. budget allocation or analytic hierarchy processes). Weights usually have an important impact on the value of the composite

In this case the concern is to reward only easy to measure and readily available baseindicators, punishing the information that is more problematic to identify and measure.

For example, in the CI of e-business readiness the indicator I1 Percentage of firms using Internet

and factor analysis Principal component analysis (PCA) and more specifically factor analysis (FA)( Section 3) group together sub-indicators that are collinear to form a composite indicator capable of capturing as much of common information of those sub-indicators as possible.

but it is rather based on the statistical dimensions of the data. According to PCA/FA, weighting only intervenes to correct for the overlapping information of two or more correlated indicators,

Methodology The first step in FA is to check the correlation structure of the data: if the correlation between the indicators is low then it is unlikely that they share common factors.

small than the number of sub-indicators, representing the data. Summarizing briefly what has been explained in Section 3,

For a factor analysis only a subset of principal components are retained (let's say m), the ones that account for the largest amount of the variance.

Rotation is a standard step in factor analysis, it changes the factor loadings and hence the interpretation of the factors leaving unchanged the analytical solutions obtained ex-ante and ex-post the rotation.

0. 93 0. 01 0. 00 0. 00 0. 49 Internet 0. 79-0. 21 0. 21 0. 42 0

0. 00 0. 07 Telephones 0. 37 0. 17 0. 38 0. 68 0. 05 0. 02 0. 12 0

With the TAI dataset the intermediate composites are 4 (Table 6. 2). The first includes Internet (with a weight of 0. 24

the third only by University (0. 77) and the fourth by Royalties and Telephones (weighted with 0. 49 and 0. 26). 15 Weights are normalized squared factor loading, e g. 0. 24=(0. 79

which is the portion of the variance of the first factor explained by the variable Internet. 58 Then the four intermediate composites are aggregated by weighting each composite using the proportion of the explained variance in the dataset:

For example if Maximum Likelihood (ML) were to be used instead of Principal Component (PC) the weights obtained would be:

M L PCA Patents 0. 19 0. 17 Royalties 0. 20 0. 20 Internet 0. 07 0. 08 Tech exports 0

. 07 0. 06 Telephones 0. 15 0. 11 Electricity 0. 11 0. 09 Schooling 0. 19 0. 10 University 0

Sensitive to modifications of basic data: data revisions and updates (e g. new observations and new countries) may change the set of weights

(i e. the estimated loadings) used in the composite. Sensitive to the presence of outliers, that may introduce spurious variability in the data Sensitive to small-sample problems

and data shortage that may make the statistical identification or the economic interpretation difficult (in general a relation between data and unknown parameters of 3: 1 is required for a stable solution).

Minimize the contribution of indicators, which do not move with other indicators. Sensitive to the factor extraction and to the rotation methods.

Examples of use Indicators of product market regulation (Nicoletti et al. OECD, 2000) Internal Market Index (EC-DG MARKT, 2001b) Business Climate Indicator (EC-DG ECFIN, 2000) General Indicator of S&t (NISTEP

, 1995) Success of software process Improvement (Emam et al. 1998) 16 To preserve comparability final weights could be rescaled to sum up to one. 59 6. 1. 2 Data envelopment analysis

and Benefit of the doubt Data envelopment analysis (DEA) employs linear programming tools (popular in Operative Research) to retrieve an efficiency frontier

Indicator 1 Indicator 2 a b c d d'0 Figure 6. 1. Performance frontier determined with Data Envelopment Analysis. Rearranged from Mahlberg and Obersteiner (2001.

and solved using optimizations algorithms k 1,..M; q 1,,...Q w 0 I w 1 s. t. CI arg max I w qkq q 1 qk qk Q q 1 qc

as calculated by the above algorithm, that does not sum up to one, making the comparison with other methods (like FA

the weights as originally produced by the algorithm can always be normalized afterwards so as to sum up to one,

Columns 1 to 8 contain weights, column 9 displays the country's composite indicator. Patents Royalties Internet Tech.

Exports Telephones Electricity Schooling University CI Finland 0. 15 0. 17 0. 17 0. 16 0. 19 0. 17 0

. 17 0. 19 1 United states 0. 20 0. 20 0. 17 0. 21 0. 15 0. 15 0. 21 0

In the extreme case of perfect collinearity among regressors the model will not even be identified. It is argued further that

It requires a large amount of data to produce estimates with known statistical properties. Examples of use Composite Economic Sentiment Indicator (ESIN) http://europa. eu. int/comm/economy finance National Innovation Capacity index (Porter and Stern, 1999

e g. the percentage of firms using internet in country j depends upon the (unknown) propensity to adopt new information and communication technologies plus an error term accounting,

The observed data consist on a cluster of q=1,, Q (c) indicators, each measuring an aspect of ph (c). Let c=1,

since it would imply separating the correlation due to the collinearity of indicators from the correlation of error terms

However, since not all countries have data on all sub-indicators, the denominator of w c,

q). The likelihood function of the observed data is maximized with respect to the unknown parameters, a q) s, ß (q) s,

Reliability and robustness of results depend on the availability of enough data. With highly correlated sub-indicators there could be identification problems.

Examples of use Employment Outlook (OECD, 1999) Composite Indicator on E-business Readiness (EC-JRC, 2004b.

AHP allows for the application of data, experience, insight, and intuition in a logical and thorough way within a 69 hierarchy as a whole.

Methodology The core of AHP is an ordinal pair-wise comparison of attributes, sub-indicators in this context, in

Table 6. 4. Comparison matrix A of eight sub-indicators (semantic scale) Objective Patents Royalties Internet Tech exports Telephones Electricity Schooling University

3 Internet 1/3 1/2 1 1/4 2 2 1/5 1/2 Tech. exports 1/2 2 4

1 4 4 1/2 3 Telephones 1/5 1/4 1/2 1/4 1 1 1/5 1

Patents is three times more important than Internet, and consequently Internet has one-third the importance of Patents.

Each judgment reflects, in reality, the perception of the ratio of the relative contributions (weights) of the two sub-indicators to the overall objective being assessed as shown in Table 6. 5 for the first three sub-indicators.

Table 6. 5. Comparison matrix A for three sub-indicators Objective Patents Royalties Internet Patents wp/wp wp/wroy wp/wi Royalties

wroy/wp wroy/wroy wroy/wi Internet wi/wp wi/wroy wi/wi The relative weights of the sub-indicators are calculated using an eigenvector technique.

450 0. 500 Patents Royalties Internet hosts Tech exports Telephones Electricity Schooling University st. Standard deviation Figure 6. 2. Results of the AHP

Analytic Hierarchy Process Advantages Disadvantages The method can be used both for qualitative and quantitative data The method increase the transparency of the composite The method requires a high number of pairwise comparisons

The conjoint analysis (CA) is a decompositional multivariate data analysis technique frequently used in marketing (see Mcdaniel And gates,

Although this methodology uses statistical analysis to treat data, it operates with people (experts, politicians, citizens) who are asked to choose which set of sub-indicators they prefer,

This is because AHP rewards with high weights (more than 20%)two indicators, High tech exports and University enrolment ratio,

The role of the variability in the weights and their influence in the value of the composite will be the object of the section on sensitivity analysis (section 7). Table 6. 6. Weights for the sub-indicators obtained using 4 different methods:

equal weighting (EW), factor analysis (FA), budget allocation (BAL), and analytic hierarchy process (AHP) Patents Royalties Internet Tech exports Telephones Electricity Schooling University EW 0. 13 0. 13 0. 13

0. 13 0. 13 0. 13 0. 13 0. 13 FA 0. 17 0. 15 0. 11 0. 06 0

(or geometric) aggregations or non linear aggregations like the multi-criteria or the cluster analysis (the latter is explained in Section 3). This section reviews the most significant ones. 6. 2. 1 Additive methods The simplest

Patents Royalties Internet Tech exports Telephones Electricity Schooling University Finland 187 125.6 200.2 50.7 3. 080 4. 150 10 27.4

25 Data are normalized not. Normalization does not change the result of the multicriteria method whenever it does not change the ordinal information of the data matrix. 78 S==+Qq 1 jk q jk q jk w (In)) 2e (w (Pr) 1 (6. 15

) where w (Pr) q jk and w (In) q jk are the weights of sub-indicators presenting a preference and an indifference relation respectively.

which enters into the computation of the overall importance of country a, in a way which is consistent with the definition of weights as importance measures.

Finland and USA shows that Finland has better scores for the sub-indicators Internet (weight 1/8

Telephones (weight 1/8), Electricity (weight 1/8) and University (weight 1/8) . Thus the score for Finland is 4*1/8=0. 5

each with pros and cons. One possible algorithm is the Condorcet-Kemeny-Young-Levenglick (CKYL) ranking procedure (Munda and Nardo 2003).

only if data are expressed all in partially comparable interval scale (i e. temperature in Celsius of Fahrenheit) of type:

Non-comparable data measured in ratio scale (i e. kilograms and pounds: where>0 i i x x f a a (i e. i a varying across subindicators) can only be aggregated meaningfully by using geometric functions,

or an algorithm to describe a real-world issue formal coherence is a necessary property. Yet, formal coherence is not sufficient.

and Factor analysis is employed usually as a supplementary method with a view to examine thoroughly the relationships among the subindicators.

because it lets the data decide on the weighting issue, and it is sensible to national priorities.

compatibility between aggregation and weighting methods. 27 Compensability of aggregations is studied widely in fuzzy sets theory, for example Zimmermann and Zysno (1983) use the geometric operator

and not importance coefficients 7. Uncertainty and sensitivity analysis The reader will recall from the introduction that composite indicators may send misleading,

A combination of uncertainty and sensitivity analysis can help to gauge the robustness of the composite indicator,

i. selection of sub-indicators, ii. data quality, iii. data editing, iv. data normalisation, v. weighting scheme, vi. weights'values, vii. composite

Uncertainty Analysis (UA) and Sensitivity analysis (SA. UA focuses on how uncertainty in the input factors propagates through the structure of the composite indicator

i. inclusion exclusion of sub-indicators, ii. modelling of data error, e g. based of available information on variance estimation. iii. alternative editing schemes,

e g. multiple imputation, described in section 4. iv. using alternative data normalisation schemes, such as rescaling, standardisation,

Also modelling of the data error, point (ii) above, will not be included as in the case of TAI no standard error estimate is available for the sub-indicators.

the exclusion of an indicator leads to a total rerun of the optimisation algorithm. When using BAL

()c Rank CI will be an output of interest studied in our uncertainty sensitivity analysis. Additionally, the average shift in countries'rank will be explored.

and sensitivity analysis (both in the first and second TAI analysis), targeting the questions raised in the introduction on the quality of the composite indicator.

the relative sub-indicator q will be neglected almost for that run. 7. 1. 4 Data quality This is not considered here as discussed above. 7. 1. 5 Normalisation As described in Section II-5 several methods are available

As the model is in fact a computer programme that implements steps i) to (vii) above, the uncertainty analysis acts on a computational model.

1 X Editing 1 Use bivariate correlation to impute missing data 2 Assign zero to missing datum The second input factor 2 X is the trigger to select the normalisation

1 and applying the so called Russian roulette algorithm, e g. for 1 X we select 1 if 0, 0. 5) and 2 if 0. 5,

We anticipate here that a scatter-plot based sensitivity analysis will allow us to track which indicator when excluded affects the output the most.

When BOD is selected the exclusion of a sub-indicator leads to a re-execution of the optimisation algorithm.

(either for the BAL or AHP schemes) are assigned to the data. Clearly the selection of the expert has no bearing

they will be generated all the same by the random sample generation algorithm. 91 The constructive dimension of this Monte carlo experiment,

such as the variance and higher order moments, can be estimated with an arbitrary level of precision that is related to the size of the simulation N. 7. 1. 7 Sensitivity analysis using variance-based techniques A necessary step

when designing a sensitivity analysis is to identify the output variables of interest. Ideally these should be relevant to the issue tackled by the model,

In the following, we shall apply sensitivity analysis to output variables()c Rank CI, and S R, for their bearing on the quality assessment of our composite indicator.

2000a, EPA, 2004), robust, model-free techniques for sensitivity analysis should be used for non linear models.

Variance-based techniques for sensitivity analysis are model free and display additional properties convenient for the present analysis:

and to explain 92 they allow for a sensitivity analysis whereby uncertain input factors are treated in groups instead of individually they can be justified in terms of rigorous settings for sensitivity analysis,

as we shall discuss later in this section. How do we compute a variance based sensitivity measure for a given input factor i X?

The usefulness of I s, Ti S, also for the case of non-independent input factors, is linked also to their interpretation in terms of settings for sensitivity analysis.

'for both dependent and independent input factors, are implemented in the freely distributed software SIMLAB (Saltelli et al.,

i e. by censoring all countries with missing data. As a result, only 34 countries could in theory be analysed.

as this is the first country with missing data, and it was preferred to analyse the set of countries whose rank was altered not the omission of missing records.

we show in Figure 7. 2 a sensitivity analysis based on the first order indices calculated using the method of Sobol'(1993) in its improved version due to Saltelli (2002).

%This underlines the necessity for computing higher order sensitivity indices that capture the interaction effects among the input factors.

Rep. of Variance of country rank Non-additive Expert selection Weighting Scheme Aggregation System Exclusion/Inclusion Normalisation Figure 7. 2. Sensitivity analysis results

, Rep. of Total effect sensitivity index Expert Weighting Aggregation Exclusion/Inclusion Normalisation Figure 7. 3. Sensitivity analysis results based on the total effect indices.

The sensitivity analysis results for the average shift in 100 ranking output variable (Equation 7. 2) is shown in Table 7. 2. Interactions are now between expert selection and weighing,

there is not much hope that a robust index will emerge, not even by the best provision of uncertainty and sensitivity analysis.

An example is the Human Development Index 2004 of the UNDP (see Figure 8. 1). This is a comprehensive approach to display results

Data retrieved on 4 october, 2004. A number of lines are superimposed usually in the same chart to allow comparisons between countries.

an assessment of progress can be made by comparing the latest data with the position at a number of baselines.

in direction away from meeting objective Insufficient or no comparable data 109 8. 5 Rankings A quick and easy way to display country performance is to use rankings.

SPEAR contains a set of core sectors and indicators that have been derived from the literature on sustainability.

8. 7 Dashboards The Dashboard of Sustainability (see http://esl. jrc. it/envind/)is a free, noncommercial software

on the internet site one can find the ecological footprint, a pure environmental composite, the environment sustainability index, presented by the World Economic Forum annual meetings, the European Environmental Agency's EEA Environmental Signals.

Dashboard of Sustainability 114 8. 8 Nation Master The following internet site is not strictly for composite indicators.

However its graphical features can be helpful for presentational purposes. www. nationmaster. com is a massive central data source on the internet with a handy way to graphically compare nations.

Nation Master is a vast compilation of data from such sources as the CIA World Factbook, United nations, World health organization, World bank, World Resources Institute, UNESCO,

This internet site is considered the web's one-stop resource for country statistics on anything and everything.

Data selection The quality of composite indicators depends also on the quality of the underlying indicators.

Imputation of missing data The idea of imputation is both seductive and dangerous. Several imputation methods are available,

and the use of‘expensive to collect'data that would otherwise be discarded. The main disadvantage of imputation is that the results are affected by the imputation algorithm used.

Normalisation Avoid adding up apples and pears. Normalization serves the purpose of bringing the indicators into the same unit.

Robustness and sensitivity The iterative use of uncertainty and sensitivity analysis during the development of a composite indicator can contribute to its well-structuring.

Uncertainty and sensitivity analysis are suggested the tools for coping with uncertainty and ambiguity in a more transparent and defensible fashion.

The hague. 2. Anderberg, M. R. 1973), Cluster analysis for Applications, New york: Academic Press, Inc. 3. Arrow K. J. 1963)- Social choice and individual values, 2d edition, Wiley, New york. 4. Arrow K. J,

Binder, D. A. 1978),"Bayesian Cluster analysis,"Biometrika, 65,31-38.7. Boscarino J. A.,Figley C. R,

Principal components analysis and exploratory and confirmatory factor analysis. In Grimm and Yarnold, Reading and understanding multivariate analysis.

In Sensitivity analysis (eds A. Saltelli, K. Chan, M. Scott) pp. 167-197. New york: John Wiley & Sons. 12.

and Seiford L. M.,(1995), Data Envelopment Analysis: Theory, Methodology and Applications. Boston: Kluwer. 13.

Cherchye L. 2001), Using data envelopment analysis to assess macroeconomic policy performance, Applied Economics, 33,407-416.14.

Statistics and Data analysis in Geology,(John Wiley & Sons, Toronto, 646p..21. Pan American Health Organization (1996) Annual report of the Director.

Dempster A p. and Rubin D. B. 1983), Introduction pp. 3-10, in Incomplete Data in Sample Surveys (vol. 2:

Modeling the likelihood of software process improvement. International software engineering research network, Technical Report ISERN-98-15.29.

Environmental protection agency (EPA), Council for Regulatory Environmental Modeling (CREM), Draft Guidance on the Development, Evaluation, and Application of Regulatory Environmental Models, http://www. epa. gov/osp/crem/library/CREM%20guidance%20draft%2012 03. pdf. 30.

Everitt, B. S. 1979),"Unresolved Problems in Cluster analysis,"Biometrics, 35,169-181.37. Fabrigar, L. R.,Wegener, D. T.,Maccallum, R c.,

Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4: 272-299.38. Fagerberg J. 2001) Europe at the crossroads:

Forman E. H. 1983), The analytic hierarchy process as a decision support system, Proceedings of the IEEE Computer society. 41.

Funtowicz S. O.,Munda G.,Paruccini M. 1990)- The aggregation of environmental data using multicriteria methods, Environmetrics, Vol. 1 (4), pp. 353-36.43.

Factor analysis. Hillsdale, NJ: Lawrence Erlbaum. Orig. ed. 1974.47. Gough C.,Castells, N, . and Funtowicz S.,(1998), Integrated Assessment:

issues and outlook. Journal of Consumer research 5, 103-123.50. Grubb D, . and Wells W.,(1993), Employment regulation and patterns of work in EC countries, OECD Economic Studies, n. 21 Winter, 7-58, Paris. 51.

and Black W c.,(1995), Multivariate Data analysis with readings, fourth ed. Prentice hall, Englewood Cliffs, NJ. 52.

Hartigan, J. A. 1975), Clustering Algorithms, New york: John Wiley & Sons, Inc. 53. Harvey A.,(1989), Forecasting, structural time series models and the Kalman filter.

A step-by-step approach to using the SAS system for factor analysis and structural equation modeling. Cary, NC:

SAS INSTITUTE. Focus on the CALIS procedure. 55. Hattie, J. 1985) Methodology Review: Assessing unidimensionality of 56. tests and items.

An Empirical Analysis Based on Survey Data for Swiss Manufacturing, Research Policy, 25,633-45.58. Hollenstein, H. 2003:

A Cluster analysis Based on Firm-level Data, Research Policy, 32 (5), 845-863.59. Homma, T. and Saltelli, A. 1996) Importance measures in global sensitivity analysis of model output.

Reliability Engineering and System Safety, 52 (1), 1-17.60. Hutcheson, G, . and Sofroniou N.,(1999).

Introduction to factor analysis: What it is and how to do it. Thousand Oaks, CA: Sage Publications, Quantitative Applications in the Social sciences Series, No. 13.63.

Factor analysis: Statistical methods and practical issues. Thousand Oaks, CA: Sage Publications, Quantitative Applications in the Social sciences Series, No. 14.64.

Karlsson J. 1998), A systematic approach for prioritizing software requirements, Phd. Dissertation n. 526, Linkoping, Sverige. 71.

Covers confirmatory factor analysis using SEM techniques. See esp. Ch. 7. 77. Koedijk K, . and Kremers J.,(1996), Market opening, regulation and growth in Europe, Economic policy (0) 23.october 78.

Factor analysis as a statistical method. London: Butterworth and Co. 81. Levine, M. S.,(1977. Canonical analysis and factor comparison.

and Schenker N.,(1994), Missing Data, in Handbook for Statistical Modeling in the Social and Behavioral Sciences (G. Arminger, C. C Clogg,

Little R. J. A (1997) Biostatistical Analysis with Missing Data, in Encyclopedia of Biostatistics (p. Armitage and T. Colton eds.

Little R. J. A. and Rubin D. B. 2002), Statistical analysis with Missing Data, Wiley Interscience, J. Wiley &sons, Hoboken, New jersey. 85.

Mahlberg B. and Obersteiner M.,(2001), Remeasuring the HDI by data Envelopment analysis, Interim report IR-01-069, International Institute for Applied System Analysis, Laxenburg

Massart, D. L. and Kaufman, L. 1983), The Interpretation of Analytical Chemical Data by the Use of Cluster analysis, New york:

Milligan, G. W. and Cooper, M. C. 1985),"An Examination of Procedures for Determining the Number of Clusters in a Data Set,"Psychometrika, 50,159-179.93.

OECD (1999) Employment Outlook, Paris. 105. OECD,(2003a), Quality Framework and Guidelines for OECD Statistical Activities, available on www. oecd. org/statistics. 106.

Making sense of factor analysis: The use of factor analysis for instrument development in health care research. Thousand Oaks, CA:

Sage Publications. 109. Pré Consultants (2000) The Eco-indicator 99. A damage oriented method for life cycle impact assessment. http://www. pre. nl/eco-indicator99/ei99-reports. htm 110.

Computer Physics Communications, 145,280-297.122. Saltelli, A.,Chan, K. and Scott, M. 2000a) Sensitivity analysis, Probability and Statistics series, New york:

John Wiley & Sons. 123. Saltelli, A. and Tarantola, S. 2002) On the relative importance of input factors in mathematical models:

Saltelli, A.,Tarantola, S. and Campolongo, F. 2000b) Sensitivity analysis as an ingredient of modelling. Statistical Science, 15,377-395.125.

Saltelli, A.,Tarantola, S.,Campolongo, F. and Ratto, M. 2004) Sensitivity analysis in practice, a guide to assessing scientific models.

A software for sensitivity analysis is available at http://www. jrc. cec. eu. int/uasa/prj-sa-soft. asp. 126.

Sobol',I. M. 1993) Sensitivity analysis for nonlinear mathematical models. Mathematical Modelling & Computational Experiment 1, 407-414.130.

USSR Computational mathematics and Physics 7, 86 112.131. Spath, H. 1980), Cluster analysis Algorithms, Chichester, England: Ellis Horwood. 132.

SPRG (2001) Report of the Scientific Peer review Group on Health Systems Performance Assessment, Scientific Peer review Group (SPRG), WHO:

Tarantola, S.,Jesinghaus, J. and Puolamaa, M. 2000) Global sensitivity analysis: a quality assurance tool in environmental policy modelling.

In Sensitivity analysis (eds A. Saltelli, K. Chan, M. Scott) pp. 385-397. New york: John Wiley & Sons. 136.

and Kiers, H. 2001) Factorial k-means analysis for two-way data, Computational Statistics and Data analysis, 37 (1), 49-64.142.

Cited with regard to preference for PFA over PCA in confirmatory factor analysis in SEM. 144. World Economic Forum (2002) Environmental Sustainability Index http://www. ciesin org/indicators/ESI/index. html. 145.

Zimmermann H. J. and Zysno P. 1983) Decisions and evaluations by hierarchical aggregation of information, Fuzzy sets and Systems, 10, pp. 243-260.129 APPENDIX TAI is made of a relatively small

raw data are freely available on the WEB, and issues of technological development are of importance to society

diffusion of the Internet (indispensable to participation), and by exports of high-and mediumtechnology products as a share of all exports.

Two sub-indicators are included here, telephones and electricity, which are especially important because they are needed to use newer technologies

However the original data set contains a large number of missing values, mainly due to missing data in Patents and Royalties.

and hence have market value (1999) Diffusion of recent innovations 130 INTERNET Internet hosts per 1, 000 people Diffusion of the Internet,

which is indispensable to participation in the network age (2000) EXPORTS%Exports of high and medium technology products as a share of total goods exports (1999) Diffusion of old innovations TELEPHONES Telephone lines

Units are given in Table A. 1. PATENTS ROYALTIES INTERNET EXPORTS TELEPHONES (log) ELECTRICITY (log) SCHOOLING ENROLMENT 1 Finland 187 125.6 200.2 50.7 3. 08 4

< Back - Next >