data

The Impact of Innovation and Social Interactions on Product Usage - Paulo Albuquerque & Yulia Nevskaya.pdf

We empirically test our model using a novel individual data set from the online gaming industry on daily content consumption, product innovation,

but research on post-purchase behavior and product usage has been limited because of the lack of revealed preferences data on consumption;

data collection mostly focuses on transactional information. As an alternative, until recently, surveys or self-reported questionnaires have been used to study usage behavior, especially regarding technology products (Ram and Jung, 1990;

we make use of a unique data set-from the popular online video gameworld ofwarcraft-that tracks product usage, content consumption choices,

To the best of our knowledge, this is one of the first empirical studies to examine how different usage drivers specified by theoretical work influence consumption decisions using revealedpreferences data on product usage.

For example, using purchase data that mimics usage patterns closely, Hartmann and Viard (2008) and Koppalle et al.

and Yang (2013) use leisure activities data to investigate the relation between consumption usage decisions and consumer lifestyles.

A recent example by Huang Khwaja and Sudhir (2012) studies the consumption decisions of drinks using intra-day data,

Section 3 provides details about the novel data set on post-purchase decisions used in the paper.

or remain without any connections to user communities. 2 We assume that consumers can make this decision at any 2we opt to model the consumer decision to join any group, instead of a specific group, in part due to data limitations and

but we found that the one-day state dependence combined with content aging to explain the data well. 9 the value of the highest level of any content enjoyed before time t by individual

We provide more details about Wit in the data section. The state variable pt denotes the index of the most recently introduced product update, pt 2 11 {0,,

and using historic data on update releases. The probability of an update occurrence-and hence the transition probability for-after pt time periods since the introduction of the previous update p is defined as Pr (pt+1=pt+1 ept)= 1/1+e(!

and Data The proposed approach can be used to obtain insights about the relation between product usage, sociability of actions,

We demonstrate its application with the study of consumer demand in the online computer gaming industry. 3. 1 An Online Game We use data from the online game World of Warcraft developed by Blizzard Entertainment, a division

Our data is related to the second expansion of the game which sold more than 4 million copies in the first month alone (Blizzard Entertainment, 2008.

The game environment and related data are particularly suitable to the study of product usage for a number of reasons.

These data take the form of dates of first time completion of specific content consumption or a task performed in the game.

Several independent websites process this information into databases that allow cross-player comparisons and provide recommendations on how to progress in the game.

In this paper, we use a publicly available data set on product usage collected from such a site called Wowhead. 11 We complement these data with information about product updates, their content, firm's actions,

and abstract from price response. 3. 2 Choice Sets and Product Updates Our data set includes daily information about the game from November of 2008 to December of 2010.

we use the beginning periods in our data set to create a starting state for each player,

the data includes only content related to the game main storyline. There are other unrelated tasks that we do not include in our analysis. 17 Patch Release time Age at t=1 Size#Tasks Task level,

similar to the schedule observed in our data. Although there was not a predefined schedule, the time interval between updates varied by only a few weeks.

and timing of future updates. 13 3. 3 Player Participation and Progression The product usage data include actions of 206 users from one of the game servers, for

Our data set does not include new players for two reasons. First, the website used as a source of the data provides information about experienced users only.

Second the content introduced by the firm during our analysis was dedicated almost entirely to increasing participation of experienced players.

We use these data to create an empirical distribution used to define consumer expectations about the schedule of product updates. 18 in our sample completed 44 tasks out of a total of 440 tasks, with a standard deviation of 32.

and can grow 14we note that our data is more detailed than the patterns presented in the figure,

The data set contains the dates when an individual decides to join or leave a game community,

with a clear break in the data, with a group of users significantly below 500 badges and another group clearly above that number.

We use data from the website World of Logs18 about the success rates for different content.

we discuss the data patterns that identify the parameters in our model. Starting with the content utility function, its intercept is identified by the average observed rates of participation for each content alternative.

the individual data on decisions of joining, remaining, or leaving a group identify the overall costs of joining

through the observed frequency in the data of remaining in groups versus the frequency of using the product while in groups.

In the data, we observe consumers attempting higher level tasks more frequently when a product update is about to be released,

Our data show that users play more and at higher levels just before and once they are part of a community,

and w (j Sdit) is the content success rate that is available as data. We note that for the community membership decision

the log-likelihood of observing the data Y is LL (Y,, V)= XN i=1 0@XG g=1"Pr) i 2 g ai;,,

V) is the conditional probability that individual i belongs to segment g given her complete history of observed choices ai,

First, we describe fit statistics that show evidence that the model explains the data well.

and consumers given the parameter values and data for each time period. Figure 3 shows actual and estimated participation over all tasks of the game.

A similar pattern is observed if we disaggregate content consumption by product update in our data. For our model, the hit-rate across all consumer choices

and the evolution of the average experience level of the user population in our data set.

In our data, we observe most updates concentrated in the first half of the product lifecycle,

Using data from the popular online game World of Warcraft, we find that motivations for product usage vary for different content.

The Impact of Innovation in Romanian Small and Medium-Sized Enterprises on Economic Growth Development - Oncoiu.pdf

and has a total of 45 questions To collect data from interviewees a number of 730 companies were contacted by phone or email between January 2013 and June 2013.

The constituents of the scoring variables was undertaken a factor analysis, and the resulting factors would the input of a cluster analysis.

The present analysis also had the aim to investigate the state of planning and how innovation process of Romanian SMES is linked with it.

Such a research project would require investing considerable time in collecting data. We certainly have many casualties among SMES due to the incorrect application of innovation process

The Relationship between innovation, knowledge, performance in family and non-family firms_ an analysis of SMEs.pdf

Data from 430 small and medium-sized enterprises were analyzed through hierarchical regression analysis, and innovation was found to be a significant factor in both family and non-family samples.

The cross-sectional nature of the data collection limits potential findings and it is unclear if similar results would be found in a comparison of large companies.

a longitudinal approach would provide more reliable data. While this research combined two samples from different countries, evidence of how this process can enhance the study was presented.

Support of these variables in both sets of data in this study is consistent with prior research,

Finally, objective performance measures may yield different results with performance data that could be verified independently.

whether the two samples varied from the findings of the combined data sets in terms of nationality and industry.

In order to test for country effects, the data were broken into two subsets:(1) US family and non-family respondents and (2) Australian family and nonfamily respondents.

The Australian data had less explanation in the family sample, and the adjusted R2 was 0. 15 but significant at the 1%level (ß=0. 39, t=4. 42, p<.01).

parallel testing was conducted for all non-family samples with comparable results for each data set.

For example, the hierarchical model with the innovation variable in the US family data set explained 42%of the variance

As the data were collected from various industries, they were tested in further regression analysis for possible industry effects.

and factor analysis was used to reduce the number of items in some scales. Hierarchical linear regression analysis was utilized to analyze the relationships between the variables in the final model.

gathered the data, and drafted the manuscript. MS contributed to the research design and performed the statistical analysis.

Expert systems with Applications, 27,459 465. Cohen, WM, & Levinthal, DA. 1990). ) Absorptive capacity: a new perspective on learning and innovation.

The Role of Open Innovation in Eastern European SMEs - The Case of Hungary and Romania - Oana-Maria Pop.pdf

whose representatives have prioritized contacting participating SMES personally for data collection whenever possible. Last but not least, the surveyed SMES were given the possibility to answer the questionnaire in their native language (with subsequent translation by the first author),

In the following sections we describe our data in relation to these topics. 3 A Characterization of Hungarian

the primary data for our explorative research was acquired through collaboration with well-established institutions as well as individual experts and consultants in two Eastern European countries:

In collecting data on SME innovativeness in terms of their new product/service introductionsvi, we have followed the prescriptions of the Oslo Manualvii (2005.

The remaining data has produced a realistic overview of SME innovativeness in our sample and is summarized in Figure 4. Table 1 Innovations

and interpreting innovation data, Publications de l'OCDE. Fletcher, D.,Helienek, E. & Zafirova, Z. 2009.

as well as direct dialogue (between the authors and their partner organizations and the SMES during data collection) have stimulated participating SMES from Hungary

accurate data for further analysis. Participants were very positive about this‘educational'aspect of the study,

The Young Foundation and the Web Digital Social Innovation.pdf

Government data is increasingly being made public improving transparency and allowing software programmers to create extra value from underused data by, for example,

mapping out injuries and deaths to cyclists on London's roads. vii The Young Foundation researches

the Young Foundation has developed a framework to help local authorities use social media to improve the delivery of public services. xiii Building upon the open data movement,

www. Mydex. org-a new community interest company backed by the Young Foundation-aims to empower individuals by giving them back control of their own data.

The government holds data about citizens in hundreds of databases, with individuals having little control over it.

The Young Foundation-for-the-Bureau-of-European-Policy-Advisors-March-2010.pdf

social movements, business models, laws and regulations, data and infrastructures, and entirely new ways of thinking

Data from the Johns Hopkins study also found astounding growth 35 rates within the nonprofit sector in all European countries where the sector's share of total employment could be compared for 1990 and 1995.

this is because most countries do not collect information on the number of social enterprises instead they collect data on the number of organisations with particular legal forms that is, the number of social cooperatives, associations, social purpose

and budgets Prevention ICT as an enabler of social innovation Reducing bureaucracy 37 In each case we have tried to focus on examples where there is some data on impact

laboratory data, patient records, waiting list information from hospitals and so on. Evaluations of the portal show that roughly one third of users seeking information and advice on their health through Sundhed are reassured

The M-PESA application is installed on SIM CARDS and works on all handsets. M-PESA has revolutionised money transfer in Kenya and significantly reduced levels of financial exclusion,

For example, looking at the field of technological innovation, the success of Silicon valley can be attributed largely to the clustering of technology firms

and better comparable cross country data. In the UK, the Department for Innovation, Universities and Skills has commissioned NESTA to develop a new‘Innovation Index'to measure the UK's innovation performance

again within sectors and across sectors Aggregating assessments of productivity and impact However, data of this kind remains underdeveloped.

and stresses that measuring all of these dimensions of well-being requires both objective as well as subjective data.

and some for developers, running the gamut from methods using artificial neural networks and‘hedonic'price models to fuzzy logic methods and, for the eager,‘auto-regressive integrated moving averages methods'and‘triple bottom line property appraisal methods'.

http://technology. open. ac. uk/cru/Nfpgermanyuk04vssn. pdf lxxvi http://ec. europa. eu/employment social/equal/data/document/etg2-suc6-synergia

The_Basque_Country_ Smart Specialisation.pdf

clustering the regional efforts on achieving excellence in science and technology. Network of Technological Centres:

the_open_book_of_social_innovationNESTA.pdf

social movements, business models, laws and regulations, data and infrastructures, and entirely new ways of thinking

research, mapping and data collection are used to uncover problems, as a first step to identifying solutions.

Artificial intelligence, for example, 1 16 THE OPEN BOOK OF SOCIAL INNOVATION has been used in family law in Australia

Research and mapping Many innovations are triggered by new data and research. In recent years, there has been a rise in the use of mapping techniques to reveal hidden needs and unused assets.

today policy and provision is interested much more in disaggregating data. There are also a range of tools for combining

and mining data to reveal new needs and patterns. 1 18 THE OPEN BOOK OF SOCIAL INNOVATION These sites show how to run competitions for‘mash up'ideas from citizens using government data, such as Sunlight Labs and Show Us a Better Way

. 9) Mapping physical assets. Within the social economy, especially amongst artists, entrepreneurs and community groups, there is a long tradition of taking advantage of empty, abandoned or derelict buildings and spaces.

Service users are responsible for all stages of the research process from design, recruitment, ethics and data collection to data analysis, writing up, and dissemination.

action research is geared normatively toward prescriptions emerging out of the data which can be employed for the improvement of future action. 16) Literature surveys

and analyse large quantities of data has been the basis for remarkable changes for example: in flexible manufacturing,

In Japanese factories data is collected by front 1 PROMPTS, INSPIRATIONS AND DIAGNOSES 21 line workers, and then discussed in quality circles that include technicians.

embedded with GPS data pinpointing the exact location of the problem. These complaints will then get forwarded to the relevant city department. 18) Integrated user-centred data such as Electronic Patient Records in the UK,

which, when linked through grid and cloud computing models provide the capacity to spot emerging patterns.

A contrasting integrated system for monitoring renal patients has led to dramatic improvements in survival rates and cost reductions in the United states. 9 19) Citizen-controlled data,

and chart their own behaviour and actions. 20) Holistic services include phone based services such as New york's 311 service which provide a database that can be analysed for patterns of recurring problems and requests. 21) Tools

The gathering and presentation of data requires a process of interpretation. This should ideally include those involved in the implementation of ideas and those affected by the proposals.

In analysing an issue or a set of data, it is useful to have the perspectives of a variety of professional disciplines,

and experiences that has a database of 4, 000 ideas online, receives a quarter of a million visitors a year,

and research data to demonstrate effectiveness and value for money (see list of metrics below) as well as adapting models to reduce costs

Variations will include toolkits, oral histories, databases, and manuals. One new initiative by Open Business is the creation of a database of open business models. 199) Barefoot consultants.

There is an important role for consultants and those with specialist knowledge who can act as knowledge brokers and advisers in the new systems.

to 5 102 THE OPEN BOOK OF SOCIAL INNOVATION provide funders or investors with data on impact;

methods using artificial neural networks and‘hedonic'price models (which attempt to define the various characteristics of a product or service), spatial analysis methods, fuzzy logic methods;‘

‘auto-regressive integrated moving averages methods';'and‘triple bottom line property appraisal methods'.'10 5 SCALING AND DIFFUSION 105 223) Operational metrics,

For example, a study of the operational data of public housing repairs found that the time taken to do repairs varied from a few minutes to 85 days,

and user-generated metrics such as the‘sousveys'surveys undertaken by citizens on services provided by the state used to gather chronic disease data in Sheffield

which provides support for every aspect of school management. 237) Personalized support services such as personal health and fitness coaches, increasingly backed up by shared data services and networks.

& How-Tos Get the Data Economic Empowerment-A year-round conversation (Forums, media spokespeople, & branding) Global Health-A year-round conversation (Forums, media spokespeople,

and girls expertise Pilot Tool kit Development Finalize summit definition Validate Indicators Test Acceptance Our Work Portfolio M&e The Work of Others Global Health Agenda Girls Database

and more formal legal devices (like public databases). With the increasing mixing of voluntary and professional roles (for example around care for the elderly,

2 244) Data infrastructures. A different, and controversial, infrastructure is the creation of a single database of children deemed‘at risk'in the UK.

This was seen as crucial to creating a holistic set of services to deal with children's needs,

that combines rich data feedback with support structures which help patients understand and treat their own conditions more effectively. 6 SYSTEMIC CHANGE 117 Strategic moves that accelerate systems change Every story of systemic innovation involves key moments

So while familiar data on income, employment, diseases or educational achievement continues to be gathered, there is growing interest in other types of measurement that may give more insights into

Requiring public agencies to publish data on their balance sheets, or to show disaggregated spending patterns,

as can the consolidation of spending data for particular areas or groups of people. Too often, public accounting has been structured around the issues of targets, control,

In 2009 it launched Wikiprogress, bringing together data and analysis on progress. The same year President Sarkozy commissioned Joseph Stiglitz to chair an inquiry into new measures of GDP.

This includes file sharing services such as Napster, and open-source software such as the Linux operating system, the Mozilla Firefox browser,

using transparent access to public financial and other data. 342) Audit and inspection regimes which overtly assess

and the coverage of core costs. 4 403) Direct funding for individuals, including the grants given by Unltd, The Skoll Foundation,

which allow recipients to rate philanthropic foundations. 427) Providing extensive information on NGO performance, such as Guidestar's services and databases in many countries worldwide,

Everyblock in Chicago provides a useful platform for aggregating ultra local data. Prosumption There has been marked a development of users becoming more engaged in the production of services.

Image courtesy of San Patrignano. 4 SUPPORT IN THE INFORMAL OR HOUSEHOLD ECONOMY 207 This could include educational coaching services, relief and backup for home carers, health coaches, birthing

146 Data 17-18; 21-2; 101-105; 112; 114; 116; 119-120; 204; De Bono, Edward 32 Demand 13;

TOWARDS TOWARDS A NETWORK NETWORK OF DIGITAL BUSINESS ECOSYSTEMS_2002.pdf

systems for ecommerce, e-procurement, Supply Chain Management, Customer relationship management, Enterprise resource planning, logistics, planning, knowledge management, business intelligence, e training.

data warehouses that improve customer relationships. The e-business opportunities are taken mainly by large organizations, whilst the single small organization faces with well-known barriers:

databases and the know-how of millions of individuals, is the ultimate source of all economic life. 15 Organizations,

They might include systems for electronic payment, for certification and trust, enterprise resource planning, customer relationship management e-procurement.

Generic software components and applications adapted for the specific sector (e g. adaptation of customer relationship management systems,

which describe the semantics of data, services, processes for that business sector Sector-specific education and training modules Knowledge basis;

and the data format are open and not depending from a unique provider, to guarantee the independence from hardware and software platforms,

coverage of the territory number of applications and services present diffusion and availability of the infrastructure.

Triple_Helix_Systems.pdf

The activities of the Triple Helix actors are measured in terms of probabilistic entropy, which, when negative, suggests a self-organizing dynamic that may temporarily be stabilized in the overlay of communications among the carrying agencies (e g.

The Consensus Space has a broad coverage of the governance concept, including government and non-government actors who interact continuously to exchange resources

The large-scale research programmes in data mining funded by the Defence Advanced Research Projects Agency (DARPA) at Stanford and a few other universities provided the context for the development of the Google search algorithm that soon became the basis

Configurational Information as Potentially Negative Entropy: The Triple Helix Model. Entropy 10,391-410. Leydesdorff, L.,Etzkowitz, H. 1996.

Emergence of a Triple Helix of University-Industry-Government Relations. Science and Public Policy 23,279-86.

Types of innovation, sources of information and performance in entrepreneurial SMEs.pdf

Research limitations/implications As the analysis was reported based on self data provided by the entrepreneurs of SMES,

Section 3 introduces the data and the research methodology used in this study. In section 4 the results of statistical analysis are presented.

which leads to missing data and possibly biased results. When reviewing the existing literature on innovation-performance relationship more broadly than is possible to depict here,

The entrepreneurs'assessment of the impact of innovations on their firms'growth and profitability is used to measure this association (see Table I). 3. Data

and research methodology 3. 1 Sample and data The primary data for this study were gathered in 2006 via a postal questionnaire among the SMES located in the Northern Savo region in Eastern Finland, approximately 400

As a sample frame for constructing the database, we used the register of SMES in the region that was offered by Suomen Asiakastieto,

In this register, the latest financial statements data of 95 000 Finnish firms and groups are on one CD.

whether the firms had introduced of implemented completely new or radically improved innovations during the four-year period (2002-2006) prior to the data collection,

with cross-sectional data we are unable to proof the existence of a causal relationship or its direction,

However, as our data do not allow a more detailed investigation of this issue, the propositions presented above should be treated with caution,

As our analysis was reported based on self data provided by the owner-managers of SMES, we have to rely on the judgment of the entrepreneur regarding the newness of the innovation.

Second, on the basis of our data, we are unable to state whether the external information source used,

Fifth, in this study the data were gathered from single informants the owner-managers of the firms only.

Guidelines for Collecting and Interpreting Innovation Data: Oslo Manual, 3rd ed.,OECD, Paris. Pavitt, K. 2005), Innovation processes, in Fagerberg, J.,Mowery, D c. and Nelson, R. R. Eds), The Oxford

U-Multirank Final Report - June 2011.pdf

databases and data collection tools...79 4. 1 Introduction 79 10 4. 2 Databases 79 Existing databases 79 4. 2. 1 Bibliometric databases 80 4

. 2. 2 Patent databases 82 4. 2. 3 Data availability according to EUMIDA 83 4. 2. 4 Expert view on data availability in non

-European countries 85 4. 2. 5 4. 3 Data collection instruments 87 Self-reported institutional data 88 4. 3. 1 4

. 3. 1. 1 U-Map questionnaire 88 4. 3. 1. 2 Institutional questionnaire 89 4. 3. 1. 3 Field-based

pilot sample and data collection...97 5. 1 Introduction 97 5. 2 The global sample 97 5. 3 Data collection 102 Institutional self-reported data 103

5. 3. 1 5. 3. 1. 1 The process 103 5. 3. 1. 2 Follow-up survey 106 5. 3. 1. 3 Data

cleaning 108 International databases 110 5. 3. 2 5. 3. 2. 1 Bibliometric data 111 5. 3. 2. 2 Patent data

115 6 Testing U multirank: results...119 6. 1 Introduction 119 6. 2 Feasibility of indicators 119 Teaching & Learning 122 6. 2. 1 Research 124 6. 2

Feasibility of data collection 133 Self-reported institutional data 133 6. 3. 1 Student survey data 135 6. 3. 2 Bibliometric

and patent data 135 6. 3. 3 6. 4 Feasibility of up-scaling 137 11 7 Applying U multirank:

Data elements shared between EUMIDA and U multirank: their coverage in national databases...84 Table 4-2:

Availability of U multirank data elements in countries'national databases according to experts in 6 countries (Argentina/AR, Australia/AU, Canada/CA, Saudi arabia/SA, South africa/ZA

, United states/US...86 Table 5-1: Regional distribution of participating institutions...99 Table 5-2:

Self-reported time needed to deliver data (fte staff days...106 Table 5-3: Self-reported time needed to deliver data (fte staff days:

European vs. non-European institutions...106 Table 6-1: Focused institutional ranking indicators: Teaching & Learning...

U multirank data collection process...104 Figure 5-2: Follow up survey: assessment of data procedures and communication...

107 Figure 5-3: Follow up survey: assessment of data collection process...107 Figure 5-4:

Follow up survey: Availability of data...108 Figure 5-5: Distribution of annual average patent volume for pilot institutes (N=165)..116 Figure 7-1:

Combining U-Map and U multirank...142 Figure 7-2: User selection of indicators for personalized ranking tables...

and business studies and should have a sufficient geographical coverage (inside and outside of the EU) and a sufficient coverage of institutions with different missions. 16 In undertaking the project the consortium was assisted greatly by four groups that it worked closely with:

An Advisory board constituted by the European commission as the project initiator which included not only representatives of the Directorate General:

Teaching and learning Research Knowledge transfer International orientation Regional engagement On the basis of data gathered on these indicators across the five performance dimensions,

However, difficulties with the availability and comparability of information mean that it would be unlikely to achieve extensive coverage levels across the globe in the short-term.

or twenty times that number and extending its field coverage from three to around fifteen major disciplinary fields,

Some modifications need to be made to a number of indicators and to the data gathering instruments based on the experience of the pilot study.

and underlying database to produce authoritative expert institutional and field based rankings for particular groups of comparable institutions on dimensions particularly relevant to their activity profiles.

what sources of data are used and by whom? We concluded from our review that different rankings

It seems that availability of quantitative data has precedence over their 27 validity and reliability.

The problem of field and regional biases in publication and citation data: many rankings use bibliometric data, ignoring that the available international publication

and citation databases mainly cover peer reviewed journal articles, while that type of scientific communication is prevalent only in a narrow set of disciplines (most natural sciences, some fields in medicine) but not in many others (engineering, other fields in medicine and natural sciences, humanities

and social sciences) 28 The problem of unspecified and volatile methodologies: in many cases, users cannot obtain the information necessary to understand how rankings have been made;

Recent reports on rankings such as the report of the Assessment of University-Based Research Expert Group (AUBR Expert Group, 2009) which defined a number of principles for sustainable collection of research data,

leading to a matrix of data that could be used in different constellations to respond to different scenarios (information needs.

data sources or lessons learned about data and data collection. The results of this part of the exercise will be reflected in the next chapters.

and the selection of data sources depends on the interest of research and the purpose of the measurement.

and how those are linked to the data they gather and display. The global rankings that we studied limit their interest to several hundred preselected universities,

A major reason why the current global rankings focus on research data is that this is the only type of data readily available internationally.

Use of statistics from existing databases. National databases on higher education and research institutions cover different information based on national, different definitions of items and are

therefore not easily used in cross-national comparisons. International databases such as those of UNESCO, OECD and the EU show those comparability problems

but moreover they are focused on the national level and are therefore not useful for institutional

or field comparisons. 3 International databases with information at the institutional level or lower aggregation levels are currently available for specific subfields:

Regarding research output and impact, there are worldwide databases on journal publications and citations (the well-known Thomson Reuters and Scopus databases.

These databases, after thorough checking and adaptation, are used in the research-based global rankings. Their strengths and weaknesses were mentioned above.

Patent databases have not been used until now for global rankings. Self-reported data collected by higher education and research institutions participating in a ranking.

This source is used regularly though not in all global rankings, due to the lack of externally available and verified statistics (Thibaud, 2009.

Self-reported data ought to be validated externally or verified; several methods to that end are available.

but are suited less for gathering factual data. Student satisfaction and to a lesser extent satisfaction of other stakeholders is used in national rankings,

Manipulation of opinion-type data has surfaced in surveys for ranking and is hard to uncover

i e. data available in national public sources are entered into 3 The beginnings of European data collection as in the EUMIDA project may help to overcome this problem for the European region in years to come. 33 the questionnaires sent to higher education institutions for data

and give them the opportunity to verify the‘pre-filled'data as well. The U-Map test with‘pre-filling'from national data sources in Norway appeared to be resulted successful

and in a substantial decrease of the burden of gathering data at the level of higher education institutions. 1. 4 Impacts of current rankings According to many commentators,

impacts of rankings on the sector are rather negative: they encourage wasteful use of resources,

institutional data and choice of publication language (English) and channels (journals counted in the international bibliometric databases).

The decision about an adequate number of‘performance categories'has to be taken with regard to the number of institutions included in a ranking and the distribution of data.

Rankings have to use multiple databases to bring in different perspectives on institutional performance. As much as possible available data sources should be used,

but currently their availability is limited. To create multidimensional 36 rankings, gathering additional data from the institutions is necessary.

Therefore, the quality of the data collection process is crucial. In addition rankings should be self-reflexive with regard to potential unintended consequences and undesirable/perverse effects.

Involvement of stakeholders in the process of designing a ranking tool and selecting indicators is crucial to keep feedback loops short,

The basic methodology, the ranking procedures, the data used (including information about survey samples) and the definitions of indicators have to be public for all users.

An important factor in the argument against rankings and league tables is the fact that often their selection of indicators is guided primarily by the (easy) availability of data rather than by relevance.

This is particularly an issue with survey data (e g. among students, alumni, staff) used in rankings.

In surveys and with regard to self-reported institutional data, the operationalizing of indicators and formulation of questions requires close attention in particular in international rankings,

Hence the indicators and underlying data/measure must be comparable between institutions; they have to measure the same quality in different institutions.

In addition to the general issue of comparability of data across institutions, international rankings have to deal with issues of international comparability.

Indicators, data elements and underlying questions have to be defined and formulated in a way that takes such contextual variations into Account for example,

in order to harmonise data on academic staff (excluding doctoral students). Feasibility The objective of U multirank is to design a multidimensional global ranking tool that is feasible in practice.

This will result in users creating their own specific and different rankings, according to their needs and wishes, from the entire database.

The other important components of the construction process for U multirank are the databases and the data collection tools that allow us to actually‘fill'the indicators.

These will be discussed further in chapter 4 as we explain the design of U multirank in more detail.

In chapters 5 and 6 we report on the U multirank pilot study during which we analysed the data quality

The first step in the indicator selection process was a comprehensive inventory of potential indicators from the literature and from existing rankings and databases.

which we presented information on the availability of data, the perceived reliability of the indicators,

Literature review Review of existing rankings Review of existing databases First selection Stakeholder consultation Expert advice Second selection Pre-test Revision Selection

The measurement of the indicator is the same regardless of who collects the data or when the measure is repeated.

The data sources and the data to build the indicator are reliable. Comparability: The indicators allow comparisons from one situation/system/location to another;

so that data are comparable. Feasibility: The required data to construct the indicator is either available in existing databases and/or in higher education and research institutions,

or can be collected with acceptable effort. Based on the various stakeholders'and experts'assessments of the indicators as well as on our analyses using the four additional criteria,

the feasibility of the data collection instruments (i e. the questionnaires used to collect the data) as well as the clarity of the definitions for the required data elements.

The outcome of the pre-test was used then as further input for the wider pilot where the actual data was collected to quantify the indicators for U multirank at both the institutional and the field level.

they have different foci, use different data, different performance indicators and different‘algorithms'to arrive at judgments.

(including expenditure on teaching related overhead) as a percentage of total expenditure Data available. Indicator is input indicator.

Data collection and availability problematic. 4 Relative rate of graduate (un) employment The rate of unemployment of graduates 18 months after graduation as a percentage of the national rate of unemployment

Data availability poses problem. 55 5 Time to degree Average time to degree as a percentage of the official length of the program (bachelor and master) Reflects effectiveness of teaching process.

Availability of data may be a problem. Depends on the kind of programs. Field-based Ranking Definition Comments 6 Student-staff ratio The number of students per fte academic staff Fairly generally available.

existence of external advisory board (including employers) Problems with regard to availability of data. 56 13 Inclusion of work experience into the program Rating based on duration (weeks/credits) and modality

(compulsory or recommended) Data easily available. 14 Computer Facilities: internet access Index including: hardware; internet access, including WLAN;(

access to computer support Data easily available. 15 Student gender balance Number of female students as a percentage of total enrolment Indicates social equity (a balanced situation is considered preferable.

In addition, data availability proved unsatisfactory for this indicator and comparability issues negatively affect its reliability.

research performance measurement frequently takes place through bibliometric data. Data on publications texts and citations is readily available for building bibliometric indicators (see Table 3-2). This is much less the case for data on research awards and data underlying impact indicators.

In addition to performance measures, sometimes input-related proxies such as the volume of research staff and research income are in use to describe the research taking place in a particular institution or unit.

Compared to such input indicators, bibliometric indicators may be more valid measures for the output or productivity of research teams and institutions.

One may mention audio visual recordings, computer software and databases, technical drawings, designs or working models, major works in production or exhibition and/or award-winning

Expert Group on Assessment of University-Based Research (2010) Apart from using existing bibliometric databases,

While this may improve data coverage, such self-reported accounts may not be standardized or reliable, because respondents may interpret the definitions differently.

Data mostly available. Recommended by Expert Group on University-based Research. Difficult to separate teaching

Data largely available. Widely used in research rankings (Shanghai, Leiden ranking, HEEACT. Different disciplinary customs cause distortion.

Data availability may be weak. 63 5 Interdisciplinary research activities Share of research publications authored by multiple units from the same institution (based on self-reported data) Research

These data refer to database years. Publishing in top-ranked, high impact journals reflects quality of research.

Data largely available. Books and proceedings are considered not. Never been used before in any international classification

Data suffers from lack of agreed definitions and lack of availability. Quantities difficult to aggregate. 9 Number of international awards

Data suffers from lack of agreed definitions and lack of availability. Quantities difficult to aggregate.

However, data availability is posing some challenges here. Research publications other than peer-reviewed journal publications are included,

While such data is available, it is limited only to national authors. During the indicator selection process the relevance of the indicator was questioned,

even though data availability and definitions may sometimes pose a challenge. Therefore it was decided to keep them in the list of indicators for U multirank's institutional ranking.

After pretesting the indicators it has become clear that there are some data availability issues in terms of a clarity of definitions (for instance FTE staff) and the cost of collecting particular indicators.

A test of the indicators (and the underlying data elements) in the more broad pilot study (see chapters 5 and 6),

68 and the near absence of (internationally comparable) data (see chapter 4) 16, it proved extremely difficult to do so.

EC Framework programs) plus direct industry income as a proportion of total income Signals KT success. Some data do exist

ISI databases available. Used in CWTS University-Industry Research Cooperation Scoreboard. 16 See also the brief section on the EUMIDA project,

One of EUMIDA's findings is that data on technology transfer activity and patenting is difficult to collect in a standardized way (using uniform definitions, etc.)

Data are available from secondary (identical) data sources. 5 Size of Technology Transfer Office Number of employees (FTE) at Technology Transfer Office related to the number of FTE

Data are mostly directly available. KT function may be dispersed across the HEI. Not regarded as core indicator by EGKTM. 6 CPD courses offered Number of CPD courses offered per academic staff (fte) Captures outreach to professions Relatively new indicator.

Data available from secondary sources (Patstat. 8 Number of Spin-offs The number of spin-offs created over the last three years per academic staff (fte) EGKTM regards Spin-offs as core indicator.

Data available from secondary sources. Clear definition and demarcation criteria needed. Does not reveal market value of spin-offs.

Data difficult to collect. 10 Annual income from licensing The annual income from licensing agreements as a percentage of total income Licensing reflects exploiting of IP.

Data available from secondary (identical) data sources. Patents with an academic inventor but another institutional applicant (s) not taken into account.

and from the pre-test it became clear that data is difficult to collect. Therefore this indicator was kept not in the list for the pilot.

For many of the indicators data are available in the institutional databases. Hardly any of such data can be found in national or international databases.

The various manifestations and results of internationalization are captured through the list of indicators shown in Table 3-5. The table includes some comments made during the consultation process that led to the selection of the indicators.

Data availability good. Relevant indicator. Used quite frequently. Sensitive to relative‘size'of national language. 2 International academic staff Foreign academic staff members (headcount) as percentage of total number of academic staff members (headcount.

Availability of data problematic. 4 International joint research publications Relative number of research publications that list one or more author affiliate addresses in another country relative to research staff

Data available in international data bases but bias towards certain disciplines and languages. 5 Number of joint degree programs The number of students in joint degree programs with foreign university (including integrated period at foreign university) as a percentage of total

Data available. Indicator not often used. Field-based Ranking Definition Comments 6 Incoming and outgoing students Incoming exchange students as a percentage of total number of students and the number of students going abroad as a percentage of total

Data available. 7 International graduate employment rate The number of graduates employed abroad or in an international organization as a percentage of the total number of graduates employed Indicates the student preparedness on the international labor market.

Data not readily available. No clear international standards for measuring. 8 International academic staff Percentage of international academic staff in total number of (regular) academic staff See above institutional ranking 9 International

Data are available. Stakeholders question relevance. 10 Student satisfaction: Internationalization of programs Index including the attractiveness of the university's exchange programs, the attractiveness of the partner universities, the sufficiency of the number of exchange places;

Data available but sensitive to location (distance to border) of HEI. Stakeholders consider the indicator important. 13 Student satisfaction:

composite indicators depend on the availability of each data element. It should be pointed out here that one of the indicators is a student satisfaction indicator:‘

and data is available, stakeholders consider this indicator not very important. Moreover, the validity is questionable as the size of the international office as a facilitating service is a very distant proxy indicator.

No national data on graduate destinations. 19http://epp. eurostat. ec. europa. eu/portal/page/portal/region cities/regional statistics/nuts classification 20 http://www. oecd

Availability of data problematic. 3 Regional joint research publications Number of research publications that list one or more author-affiliate addresses in the same NUTS2 or NUTS3 region,

Data available (Web of Science), but professional (laymen's) publications not covered. 4 Research contracts with regional business The number of research projects with regional firms,

Definition of internship problematic and data not readily available. Disciplinary bias. Field-based Ranking Definition Comments 6 Degree theses in cooperation with regional enterprises Number of degree theses in cooperation with regional enterprises as a percentage of total number

Data not readily available. Indicator hardly ever used. 9 Student internships in local/regional enterprises Number of internships of students in regional enterprises (as percentage of total students See above institutional ranking,

Limited availability of data. Lack of internationally accepted definition of summer school courses. 77 During the process of selection of indicators the list of indicators underwent a number of revisions.

While data may be found in international patent databases, the indicator is used not often and stakeholders did not particularly favor the indicator.

but data constraints prevent us from the use of such an indicator. Public lectures that are open to an external

The above discussion makes it clear that regional engagement is a dimension that poses many problems with regard to availability of performance-oriented indicators and their underlying data.

In the next chapter we will discuss the data gathering instruments that are available more extensively In chapters 5

databases and data collection Multirank: databases and data collection Multirank: databases and data collection Multirank:

databases and data collection Multirank: databases and data collection tools tools 4. 1 Introduction In this chapter we will describe the databases

and data collection instruments used in constructing U multirank. The first part is an overview of existing databases mainly on bibliometrics and patents.

The second presents an explanation of the questionnaires and survey tools used for collecting data from the institutions (the self-reported data) at the institutional

and department levels and from students. 4. 2 Databases Existing databases 4. 2. 1one of the activities in the U multirank project was to review existing rankings

and explore their underlying databases. If existing databases can be relied on for quantifying the U multirank indicators this would be helpful in reducing the overall burden for institutions in handling the U-Multirank data requests.

However, from the overview of classifications and rankings presented in chapter 1 (section 1. 3) it is clear that international databases holding information at institution level

or at lower aggregation levels are currently available only for particular aspects of the dimensions Research and Knowledge Transfer.

For other aspects and dimensions, U multirank will have to rely on self-reported data. Regarding research output and impact, there are worldwide databases on journal publications and citations.

For knowledge transfer, the database of patents compiled by the European Patent office is available. In the next two subsections

available bibliometric and patent databases will be discussed. To further assess the availability of data covering individual higher education and research institutions,

the results of the EUMIDA project were taken also into account. 21 The EUMIDA project (see:

www. eumida. org) seeks to develop the foundations of a coherent data infrastructure (and database) at the level of individual higher education institutions.

Section 4. 2. 4 presents an overview of availability based on the outcomes of the EUMIDA project.

Our analysis on data availability was completed with a brief online consultation with the group of international experts connected to U multirank (see section 4. 2. 5). The international experts were asked to give their assessment of the 21 The U multirank project was granted access to the preliminary

outcomes of the EUMIDA project in order to learn about data availability in the countries covered by EUMIDA. 80 situation with respect to data availability in some of the non-EU countries included in U multirank Bibliometric databases 4. 2. 2there are a number of international databases

which can serve as a source of information on the research output of a higher education and research institution (or one of its departments).

An institution's quantity of research-based publications (per capita) reflects its research output and can also be seen as a measure of scientific merit or quality.

In particular, if its publications are cited highly within the international scientific communities this may characterize an institution as high-impact and high-quality.

The production of publications by a higher education and research institute not only reflects research activities in the sense of original scientific research,

but usually also the presence of underlying capacity and capabilities for engaging in sustainable levels of scientific research. 22 The research profile of a higher education

and research institution can be specified further by taking into account its engagement in various types of research collaboration.

For this one can look at joint research publications involving international, regional and private sector partners.

The subset of jointly authored publications is a testimony of successful research cooperation. Data on numbers and citations of research publications are covered relatively well in existing databases.

Quantitative measurements and statistics based on information drawn from bibliographic records of publications are called usually‘bibliometric data'.

'These data concern the quantity of scientific publications by an author or organisation and the number of citations (references) these publications have received from other research publications.

There is a wide range of research publications available for characterizing the research profile and research performance of an institution by means of bibliometric data:

lab reports, journal articles, edited books, monographs, etc. The bibliometric methodologies applied in international comparative settings such as U multirank usually draw their information from publications that are released in scientific and technical journals.

This part of the research literature is covered(‘indexed')by a number of international databases. In most cases the journals indexed are reviewed internationally peer,

which means that they adhere to international quality standards. U multirank therefore makes use of international bibliometric databases to compile some of its research performance indicators

and a number of research-related indicators belonging to the dimensions of Internationalisation, Knowledge Transfer and Regional Engagement. 22 This is why research publication volume is a part of the U-Map indicators that reflect the activity profile of an institution. 81 Two of the most well-known databases that are available for carrying out

bibliometric analyses are the Web of Science and Scopus. 23 Both are commercial databases that provide global coverage of the research literature

and both are easily accessible. The Web of Science database is maintained by ISI, the Institute for Scientific Information,

which was taken over by Thomson Reuters a few years ago. The Web of Science currently covers about 1 million new research papers per year

published in over 10,000 international and regional journals and book series in the natural sciences, social sciences,

and arts and humanities. According to the Web of Science website, 3, 000 of these journals account for about 75%of published articles

and over 90%of cited articles. 24 The Web of Science claims to cover the highest impact journals worldwide,

The Scopus database was launched in 2004 by the publishing house Elsevier. It claims to be the largest abstract

and citation database containing both peer-reviewed research literature and web sources. It contains bibliometric information covering some 17

as well as data about conference papers from proceedings and journals. To compile the publications-related indicators in the U multirank pilot study,

bibliometric data was derived from the October 2010 edition of the Web of Science bibliographical database.

An upgraded‘bibliometric version'of the database is housed and operated by the CWTS (being one of the CHERPA Network partners) under a full license from Thomson Reuters. This dedicated version includes the‘standardized institutional names'of higher education

and multidisciplinary database, has its pros and cons. The bulk of the research publications are issued in peer-reviewed international scientific and technical journals,

and no books or 23 Yet another database is Google Scholar. This is a service based on the automatic recording by Google's search engine of citations to any author's publications (of whatever type) included in other publications appearing on the worldwide web. 24 See:

It has a relatively poor coverage of non-English language publications. The coverage of publication output is quite good in the medical sciences, life sciences and natural sciences,

but relatively poor in many of the applied sciences and social sciences and particularly within the humanities.

The alternative source of bibliographical information, Elsevier's Scopus database, is likely to provide an extended coverage of the global research literature in those underrepresented fields of science.

For the following six indicators selected for inclusion in the U multirank pilot test (see chapter 6) one can derive data from the CWTS/Thomson Reuters Web of Science database:

Patent databases 4. 2. 3as part of the indicators in the Knowledge Transfer dimension, U multirank selected the number of patent applications for

Data for the co-patenting and patents indicators may be derived from patent databases. For U multirank, patent data were retrieved from the European Patent office (EPO.

Its Worldwide Patent Statistical Database (version October 2009) 25, also known as PATSTAT, is designed and published on behalf of the OECD Taskforce on Patent Statistics.

Other members of this taskforce include the World Intellectual Property Organisation (WIPO), the Japanese Patent office (JPO), the United states Patent and Trademark Office (USPTO), the US National Science Foundation (NSF),

and the European commission represented by Eurostat and by DG Research. 25 This version is held by the K. U. Leuven (Catholic University Leuven)

83 The PATSTAT patent database is designed especially to assist in advanced statistical analysis of patent data.

It contains patent data from over 80 countries; adding up to 70 million records (63 million patent applications and 7 million granted patents.

The patent data are sourced from offices worldwide, including of course the most important and largest ones such as the EPO, the USPTO, the JPO and the WIPO.

Data availability according to EUMIDA 4. 2. 4like the U multirank project, the EUMIDA project (see http://www. eumida. org) collects data on individual higher education and research institutions.

whether a data collection effort can be undertaken by EUROSTAT in the foreseeable future. EUMIDA covers 29 countries (the 27 EU member states plus two additional countries:

Switzerland and Norway) and investigates the data available from national databases in as far as these are held/maintained by national statistical institutes, ministries or other organizations.

The EUMIDA project has demonstrated that a regular data collection by national statistical authorities is feasible across (almost) all EU-member states,

The EUMIDA and U multirank project teams agreed to share information on issues such as definitions of data elements

and data sources, given that the two projects share a great deal of data (indicators). The overlap lies mainly in the area of data related to the inputs (or activities) of higher education and research institutions.

A great deal of this input-related information is used in the construction of the indicators in U-Map.

The EUMIDA data elements therefore are much more similar to the U-Map indicators, since U-Map aims to build activity profiles for individual institutions whereas U multirank constructs performance profiles.

The findings of EUMIDA point to the fact that for the more research intensive higher education institutions, data for the dimensions of Education and Research are covered relatively well

although data on graduate careers and employability are sketchy. Some 26 A patent family is a set of patents taken in various countries to protect a single invention

www. uspto. gov). 84 data on scientific publications is available for most countries. However, overall, performance-related data is less widely available compared to input-related data items.

The role of national statistical institutes is limited quite here and the underlying methodology is not yet consistent enough to allow for international comparability of data.

Table 4-1 below shows the U multirank data elements that are covered in EUMIDA and whether information on these data elements may be found in national databases (statistical offices, ministries, rectors'associations, etc.).

The table shows that EUMIDA primarily focuses on the Teaching & Learning and Research dimensions,

with some additional aspects relating to the Knowledge Transfer dimension. Since EUMIDA never had the intention to cover all dimensions of an institution's activity (or its performance),

The table illustrates that information on only a few U multirank data elements is available from national databases and,

moreover, what data exists is available only in a small minority of European countries. This implies

once again, that the majority of data elements will have to be collected directly from the institutions themselves.

Data elements shared between EUMIDA and U multirank: their coverage in national databases Dimension EUMIDA and U multirank data element European countries where data element is available in national databases Teaching & Learning relative rate of graduate unemployment

CZ, FI, NO, SK, ES Research expenditure on research AT*,BE, CY, CZ*,DK, EE, FI, GR*,HU, IT, LV*,LT*,LU, MT*,NO, PL*,RO*,SI*,ES, SE, CH,

There are confidentiality issues (e g. national statistical offices may not be prepared to make data public without consulting individual HEIS)( p) indicates:

Data are only partially available (e g. only for public HEIS, or only for (some) research universities) The list of EUMIDA countries with abbreviations:

) Expert view on data availability in non-European countries 4. 2. 5the Expert Board of the U multirank project was consulted to assess for their six countries all from outside Europe the availability of data

whether data was available in national databases and/or in the institutions themselves. Table 4-2 shows that the Teaching and Learning dimension scores best in terms of data availability.

The dimensions Research and Knowledge Transfer have far less data available on the national level,

but this is compensated by the data available at the institution level. The same holds true to a lesser extent, for the dimension International Orientation, where little data is available in national databases.

The Regional Engagement dimension is the most problematic in terms of data availability. Here, data will have to be collected from the individual institutions. 27 Argentina, Australia, Canada, Saudi arabia

South africa and the US. 86 Table 4-2: Availability of U multirank data elements in countries'national databases according to experts in 6 countries (Argentina/AR, Australia/AU, Canada/CA, Saudi arabia/SA, South africa/ZA

, United states/US) Dimension U multirank data element Countries where data element is available in national databases Countries where data element is available in institutional database Teaching & Learning

expenditure on teaching AR, US, ZA AR, AU, SA, ZA time to degree AR, CA, US, ZA AR, AU, CA, SA, ZA

graduation rate AR, CA, US, ZA AR, AU, SA, ZA relative rate of graduate unemployment AU, CA

US Research expenditure on research AR, AU, ZA AR, AU, SA, US, ZA number of post-doc positions CA, US, ZA research

In the Research dimension, Expenditure on Research and Research Publication Output data are represented best in national databases.

however, information is not really available in national databases. According to the experts consulted, more data can probably be found in institutional databases.

However, if that is the case, there is always a risk that different institutions may use different definitions

Even if there is information available in databases (national, institutional, or other), our experts stressed that it is not always easy to obtain that information (for instance in case of data relating to the dimension Regional Engagement).

To obtain a better idea of data availability, we carried out a special pre-test (see section 4. 3. 3). 4. 3 Data collection instruments Due to the lack of adequate data sets,

the U multirank project had to rely largely on self-reported data (both at the institutional

and field-based levels), collected directly from the higher education and research institutions. The main instruments to collect data from the institutions were four online questionnaires:

three for the institutions and one for students. 88 The four surveys are: U-Map questionnaire institutional questionnaire field-based questionnaire student survey.

In designing the questionnaires, emphasis was placed on the way in which questions were formulated. It is important that they can only be interpreted in one way

Self-reported institutional data 4. 3. 14.3.1.1 U-Map questionnaire As explained, the U-Map questionnaire is an instrument for identifying similar subsets of higher education institutions within the U multirank sample.

Data is collected in seven main categories: general information: name and contact; public/private character and age of institution;

staff data: fte and headcount; international staff; income: total income; income by type of activity;

data is collected on the performance of the institution. Like the U-Map questionnaire, this questionnaire is structured along the lines of different data types to allow for a more rapid data collection by the institution's respondents.

The questionnaire is divided therefore into the following categories: general information: name and contact; public/private character and age of institution;

coverage; research & knowledge transfer: publications; patents; concerts and exhibitions; start-ups. As the institutional questionnaire and the U-Map questionnaire partly share the same data elements,

institutions were advised to first complete the U-Map questionnaire. Data elements from U-Map are transferred automatically to the U multirank questionnaire using a‘transfer tool'.

'The academic year 2008/2009 was selected as the default reference year. 4. 3. 1. 3 Field-based questionnaire The field-based questionnaire includes information on individual faculties/departments

Like the institutional questionnaire, the field-based questionnaire is structured along the different types of data requested to reduce the administrative burden for respondents.

Data was collected for the reference period 2009/2010 for data which are expected to be subject to annual fluctuations;

data for three subsequent years was collected to calculate three-year averages. 28 See Appendix 12 for the institutional questionnaire. 90 The following categories are distinguished:

and asks for the students'basic demographic data and information on their programme. The main focus of the survey is on the assessment of the teaching

Pretesting the instruments 4. 3. 3a first version of the three new data collection instruments (the institutional questionnaire,

The U multirank questionnaires were tested in terms of cultural/linguistic understanding, clarity of definitions of data elements and feasibility of data collection.

Instead of asking them to provide all the data on a relatively short notice, these institutions were contacted to offer their feedback on the clarity of the questions and on the availability of data.

According to the pre-test results, the general format and structure of the institutional questionnaire seemed to be clear and user-friendly.

Secondly, several indicators presented difficulties to respondents because the required data was collected not centrally by the institution.

Problems emerge however with some output-related 92 data elements such as graduate employment, where often data is collected not at the institutional level.

Interdisciplinarity of programs proved to be another problematic indicator where problems emerged due to the definition of the concept and the absence of the required data.

Research. Most data items in this dimension did not lead to problems. In fact, some of the key indicators are extracted from international bibliometric databases anyway

and did need not data provision from the institutions. As expected, some difficulties emerged for‘art-related outputs'.

'Sharper definitions were called for here. Knowledge Transfer and Regional Engagement. Compared to Teaching and Research,

these two dimensions are less prevalent in existing national and institutional databases and therefore presented some data availability problems.

This was the case for‘graduates working in the region'and‘student internships in regional enterprises'.

'Comprehensive information on start-up firms and professional development courses was not always available for institutions as a whole.

The pre-test did reveal a need for clearer definitions for some data elements. Pre-test results also indicated that some data elements

although highly relevant and valid, could not be collected feasibly because institutions did not have such data.

With respect to this issue the project team, with the help of the Advisory board, had a critical look at the problematic indicators

Problems with regard to the availability of data were reported mainly on issues of academic staff (e g. fte data, international staff), links to business (in education/internships and research) and the use of credits (ECTS.

In order to come to a meaningful and comprehensive set of indicators at the conclusion of the U multirank pilot study we had to aim for a broad data collection to cover a broad range of indicators.

One will have to deal with the issue of institutions providing‘estimated'values instead of data from existing data sets.

this enabled us to get an impression about the precision of data. For the student questionnaire the conclusion was that there is no need for changes in the design.

Comments received showed that the questionnaire is seen as a useful instrument. 94 Supporting instruments 4. 3. 4in order to assure that a comparable data set was obtained,

This is particularly important as institutions from diverse national settings are an important source for data collection.

The following supporting instruments were provided to offer more clarity to the respondents during the process of data collection:

Throughout the data collection process the glossary was updated regularly. A‘frequently asked questions'(FAQ) section next to a‘Helpdesk'function was launched on the website.

Protocols describing data collection and handling were developed to explain to the institutions in detail how the different steps were laid out from the start to the finish and the finalising of the data collection.

A technical specifications protocol for U multirank was developed introducing additional functions in the questionnaire to ensure that a smooth data collection could take place:

the option to download the questionnaire in Pdf format, the option to transfer data from the U-Map to the U multirank institutional questionnaire,

and the option to have multiple users access the questionnaire at the same time. We updated the U multirank website regularly

and provided information about the steps/time schedules for data collection. All institutions had clear communication partners from the U multirank team. 4. 4 A concluding perspective This chapter, providing a quick survey of existing databases,

underlines that there are very few international databases/sources where data can be found for our type of rankings.

The only sources that are available are international databases holding bibliometric and patent data. This implies that

in particular for a ranking that aims to sketch a multidimensional picture of an institution at the institutional and disciplinary field levels,

one will have to rely to a large extent on data collected by means of questionnaires sent to representatives of institutions, their students and possibly their 95 graduates.

The way the data are collected then becomes a critical issue, where compromises have to be built between comprehensiveness,

Different questionnaires will have to be sent to the different data providers: institutions, representatives of departments in the institution and students.

As rankings order their objects in terms of their scores on quantitative indicators they require uniform definitions of the underlying data elements.

as well as different national customs and definitions of indicators, there are limits to the comparability of data.

and comments to the data they submit through the questionnaires. In a few cases, one may have to allow respondents to provide estimates for some of the answers

if data is otherwise unavailable or too costly to collect. Checking the answers can be done based on internal consistency checks,

comparing data to that of other institutions, or making use of data from other sources, but this clearly also has its limits.

What this chapter has made clear is that the questionnaires and surveys need to be tested first on a small scale before embarking on a bigger survey.

Taking into account the experiences from other similar ranking/data collection projects and making use of the advice of external experts

and national correspondents in the testing and further execution of the survey is yet another part of the provision that needs to be part of the data collection strategy. 5 Testing UTESTING U Testing U

pilot sample and data collectionmultirank: pilot sample and data collection Multirank: pilot sample and data collection Multirank:

pilot sample and data collection Multirank: pilot sample and data collection Multirank: pilot sample and data collection Multirank:

pilot sample and data collectionmultirank: pilot sample and data collection Multirank: pilot sample and data collectionmultirank:

pilot sample and data collection Multirank: pilot sample and data collection Multirank: pilot sample and data collection Multirank:

pilot sample and data collectionmultirank: pilot sample and data collection Multirank: pilot sample and data collection Multirank:

pilot sample and data collection Multirank: pilot sample and data collection Multirank: pilot sample and data collection 5. 1 Introduction Now that we have presented the design

and construction process for U multirank, we will describe the feasibility testing of this multidimensional ranking tool.

This test took place in a pilot study specifically undertaken to analyse the actual feasibility of U multirank on a global scale.

In this chapter we will describe the processes of recruiting the sample of pilot institutions and data collection in the pilot study the collection of both self-reported institutional data

and data from international databases. 5. 2 The global sample A major task of the feasibility study was the selection of institutions to be included in the pilot study.

The selection of the 150 pilot institutions (as specified in the project outline) needed to be informed by two major criteria:

including a group of institutions that reflects as much institutional diversity as possible; and making sure that the sample was regionally and nationally balanced.

In addition we needed to ensure sufficient overlap between the institutional ranking and the field-based rankings in business studies and two fields of engineering.

As has been indicated in chapter 2 of this report, one of the basic ideas of U multirank is the link to U-Map.

U-Map is an effective tool to identify institutional activity profiles of institutions similar enough to compare them in rankings.

Yet at this stage of its development U-Map includes only a limited number of provisional institutional profiles

which makes it insufficiently applicable for the selection of the sample of pilot institutions for the U multirank feasibility test.

Since U-Map cannot yet offer sets of comparable institutional profiles we needed to find another way to create a sample with a sufficient level of diversity of institutional profiles.

We do not (and cannot) claim that we have designed a sample that is representative of the full diversity of higher education in the world (particularly as there is no adequate description of this diversity)

but we have succeeded in including a wide variety of institutional types in our sample. Potential pilot institutions to be invited for the sample were identified in a number of ways:

The existing set of higher education institutions in the U-Map database was included. This offered a clear indication of a broad variety of institutional profiles. 98 Some universities applied through the U multirank website to participate in the feasibility study.

Finally 115 institutions submitted data as part of the pilot study. Table 5-1: Regional distribution of participating institutions Region and Country Initial proposal for number of institutions Institutions in the final pilot selection Institutions that confirmed participation Institutions which delivered U multirank institutional data Institutions

which delivered U multirank institutional data and U-Map data July 2010 February 2011 April 2011 April 2011 I. EU 27 (population in millions) Austria (8m

) 2 2 5 5 4 Belgium (10m) 3 3 5 3 3 Bulgaria (8 m) 2 3 3 3 3

Region and Country Initial proposal for number of institutions Institutions in the final pilot selection Institutions that confirmed participation Institutions which delivered U multirank institutional data Institutions

which delivered U multirank institutional data and U-Map data Netherlands (16m) 3 7 3 3 3 Poland (38m) 6 12 7 7 6

and Country Initial proposal for number of institutions Institutions in the final pilot selection Institutions that confirmed participation Institutions which delivered U multirank institutional data Institutions

which delivered U multirank institutional data and U-Map data Other Asia 5 2 The Philippines 1 1 1 Taiwan 1 1 0 Vietnam 2

770 students provided data via the online questionnaire. After data cleaning we were able to include 5, 901 student responses in the analysis:

45%in business studies; 23%in mechanical engineering; and 32%in electrical engineering. 5. 3 Data collection The data collection for the pilot study took place via two different processes:

the collection of self-reported data from the institutions involved in the study (including the student survey) and the collection of data on these same institutions from existing international databases on publications/citations and patents.

In the following sections we discuss these data collection processes. 103 Institutional self-reported data 5. 3. 15.3.1.1 The process The process of data collection from the organizations was organised in a sequence of steps

(see Figure 5-1). First we asked the institutions, after official confirmation of participation, to fill in a contact form.

The data collection entailed the following instruments: The U-Map questionnaire to identify institutional profiles Institutional ranking:

U multirank data collection process The institutions were given seven weeks to collect the data, with deadlines set according to the dates the institution confirmed their participation.

After the deadlines for data submission had passed, we checked on the questionnaires submitted by the institutions.

These different steps allowed us to actively follow the data collection process and to assist institutions as needed.

An important element in terms of quality assurance of the data was a feedback loop built into the process.

After the institutions had submitted their questionnaires their data was checked and we provided comments and questions.

check their data, correct inconsistencies and add missing information. Organising a survey among students on a global scale was one of the major challenges in U multirank.

The data collection through the student survey was organized by the participating institutions. They were asked to send invitation letters to their students,

901 could be included in the analysis. 106 5. 3. 1. 2 Follow-up survey After the completion of the data collection process we asked those institutions that submitted data to share their experience of the process

One particular issue was the burden of data delivery in the various surveys. As can be seen in Table 5-2 this burden differed substantially between the pilot institutions.

Self-reported time needed to deliver data (fte staff days) Data collection tool N Minimum Maximum Mean Institutional questionnaire 26 1. 0 30

spent significantly less time on delivering the data than the institutions from outside Europe. Table 5-3:

Self-reported time needed to deliver data (fte staff days: European vs. non-European institutions Data collection tool Europe Non-Europe Mean N Mean N Institutional questionnaire 6. 2 15

8. 3 10 Field questionnaire Business studies 2. 5 10 7. 3 7 Field questionnaire Electrical engineering 3. 5 8 7. 0

5-2 shows that the data collection process and procedures were judged positively by pilot institutions

assessment of data procedures and communication Other questions in the follow-up survey referred to the efficiency of data collection and the clarity of the questionnaires.

In general the efficiency of data collection was reported to be good by the pilot institutions; critical comments indicated some confusion about the relationship between the U-Map and U multirank institutional questionnaires.

assessment of data collection process Some institutions were critical about the clarity of questions. Comments show that this criticism refers mainly to issues concerning staff data (e g. the concept of full-time equivalents)

and to aspects of research and knowledge transfer (e g. international networks, international prizes, cultural awards and prizes).

In the follow-up survey we also asked about major problems in delivering the data. Most pilot institutions reported no major problems with regard to student,

graduate and staff data. If they had problems these were mostly with research and third mission data (knowledge transfer,

regional engagement)( See Figure 5-4). 02468 10 12 very good good neutral poor very poor General procedures Communication with U multirank 0123456789

10 very good good neutral poor very poor Efficiency of data collection FIR FBR 012345678 very good good neutral poor very

Availability of data 5. 3. 1. 3 Data cleaning As was indicated earlier, due to the lack of relevant and useful data sets we had to rely largely on self-reported data (both at the institutional

and the field-based level). This inevitably raises the question of the control and verification of data.

Based on the experiences from U-Map and from the CHE ranking we applied a number of mechanisms

and procedures to verify data. Verification refers to the identification and correction of errors due to:

Simple data errors Potential manipulation of data In order to reduce the number of errors due to misunderstanding of definitions

and the U multirank technical specification email (see appendices 10 and 11) with the institutions to ensure that a smooth data collection could take place.

The main part of the verification process consisted of the data cleaning procedures after receiving the data.

the particular data were included not in the pilot data analysis. The main data 024681012141618student datagraduate datastaff datafinancial dataresearch datatheird Mission Datano problemslack of clarity of definitionslarge effort Data not availablelimited relevanceother problems 109 cleaning

procedures carried out on the data provided by the institutions are described below. The institutional questionnaires For the institutional questionnaires we performed the following checks:

A check on the outliers in the data elements: the raw data (the answers provided by the institutions) were analysed first regarding outliers.

If a score was extremely high or low (compared to the scores of the other institutions on that data element),

the data element was flagged for further analysis. A check on the outliers in indicator scores:

the scores on the indicators were calculated using the raw data and the formulas. If a score was extremely high

the data element was flagged for further analysis. A check for missing values: the data elements where data were missing

or not available were flagged. Comments regarding reasons for missing data were studied and the missing values were compared to data from other institutions from the same country.

These three checks were performed first for the entire data Set in addition, more detailed checks were performed within a country or region.

The focus of these more detailed checks was on: Reference years: a basic check on the consistency of the reference years.

Comments: the comments were used as a source of information for missing values and for potential threats to the validity due to deviant interpretations.

whether we could find information regarding the relevant data element. The same procedure was followed when information was missing.

other publicly available data sources were identified and studied to find out whether the outlier was due to inadequate interpretation

and data provision regarding the question/data element or to a particular characteristic of the institution.

Feedback cycles during the data collection process. After the first deadline we reviewed the data delivered thus far

and inserted questions into the questionnaire which was sent again to the institutions. Analyses of outliers:

The data provided were studied over time and specific changes in trends were analysed. The student survey For the student survey, after data checks we omitted the following elements from the gross student sample:

Missing data on the students'institution Missing data on their field of study (business studies, mechanical engineering, electrical engineering) Students enrolled in programs other than bachelor/short national first degree programs

and master/long national first degree programs Students who had spent little time on the questionnaire and had responded not adequately.

As a result of these checks the data of about 800 student questionnaires have been omitted from the sample. International databases 5. 3. 2the data collection regarding the bibliometric and patent indicators took place by studying the relevant international databases

and extracting from these databases the information to be applied to the institutions and fields in the sample. 111 5. 3. 2. 1 Bibliometric data As indicated in chapter 4,

we analysed the October 2010 edition of the Web of Science database (Wos) to compile the bibliometric data of the institutions involved in the sample.

A crucial aspect of this analysis was the identification of the sets of publications produced by one and the same institution,

which is labelled then by a single,‘standardised'name tag. The institutions were delimitated according to the set of Wos-indexed publications that contain an author affiliate address explicitly referring to that institution.

statistics were produced that are represented sufficiently in the Wos database, either in the entire Wos or in the preselected Wos fields of science.

'The bibliometric data in the pilot version of U multirank database refer to one measurement per indicator.

In the case of the indicators#1-#4 (see section 4. 2. 2) the most recently available publication year was selected for producing statistical data:

The statistics are in form of frequency data or as frequency categories (frequency range. Also, in the case of indicators#2,#3,

and#4 the data were expressed as the share of co-publications within total publication input.

The citation impact data require a citation window stretching back into the recent past in order to collect a sufficiently large number of citations.

The publication count data are all based on a‘whole counting'method where a publication is attributed in full to each main organization listed in the author addresses.

The annual statistics refer to publication years (rather than database years. The computation routine for the field-normalized citation rate indicator involved collecting citations to each publication according to a variable citation window,

These data refer to database years. 114 The research publications in the three fields of our pilot study (business studies,

Hence, in these cases the available bibliometric data were insufficient to create valid and reliable information for the bibliometric performance indicators,

especially when the data is drawn from the Wos database for just a single (recent) publication year.

remove the institution from all indicators that involve bibliometric data; include bibliometric information only for the overall profile across all fields of science;

with the annual field-specific thresholds set at 10 to 15 publications. 5. 3. 2. 2 Patent data As indicated in chapter 4 (section 4. 2. 3

), for our analysis of patents we collected data from the October 2009 version of the international PATSTAT-database.

In this database the institutions participating in the sample were identified and studied in order to extract the institutional-level patent-data.

The extraction covers patents from the three largest patent offices worldwide: the European Patent office (EPO), the US Patent and Trademark Office (USPTO) and the World Intellectual Property Organization (WIPO.

The extraction of institutional-level patent data is based on identification of the institute in the applicant field of the PATSTAT database (see appendix 7:

i e. without an external‘bottom-up'verification of the extracted data by one or more representatives of each organization.

although the above discussed harmonization steps imply high levels of accuracy and coverage (see also Magerman, 2009;

Using inventor information for extracting institution-level data is impossible, as patent documents contain no (systematic) information on the institutional affiliation of individual inventors.

On the contrary, the data provided and discussed in the study by Lissoni et al. 2008) show that the extent of academic scientists'contribution to national patenting in France,

As such, when interpreting institution-level patent data such as the ones provided in this study, one should at all times bear in mind the relatively sizable volume of university-invented patents that is not retrieved by the institution-level search strategy and institutional and national variations in the size of the consequential limitation bias.

due to a lack of concordance with the field classification that is present in the patent database.

We will first present the feasibility of the use of the various indicators presented in chapter 3. Next we will discuss the feasibility of the data collection procedures including the quality of the data sources.

the measurement of the indicator is the same regardless of who collects the data or when comparability:

the required data are available or can be collected with an acceptable level of effort. 120 Using these criteria the indicators were preselected'as the base for the pilot test.

relevance concept/construct validity face validity robustness consisting of reliability and comparability availability (of data),

but in most cases data on the indicators can be collected and interpreted. Score‘C'indicates that there are serious problems in collecting data on the indicator.

The (post-pilot) feasibility score is based on three criteria: data availability: the relative actual existence of the data needed to build the indicator.

If information on an indicator or the underlying data elements is/are missing for a relatively large number of cases,

the data availability is assumed to be low. conceptual clarity: the relative consistency across individual questionnaires regarding the understanding of the indicator.

If, in the information collected during the pilot study, there is a relatively large and/or diversified set of comments on the indicator in the various questionnaires,

the conceptual clarity is assumed to be low. data consistency: the relative consistency regarding the actual answers in individual questionnaires to the data needs of the indicator.

If in the information collected during the pilot study, there is a relatively large level of inconsistencies in the information provided in the individual questionnaires,

the data consistency is assumed to be low. 121 Indicators which were rated‘A'or‘B'during (pre-pilot) preliminary rating but

which received A c'in terms of the (post-pilot) feasibility score were reconsidered with regard to their inclusion in the final list of indicators.

despite the problematic score and therefore efforts to enhance the data situation will be proposed; these indicators are kept‘in'.

Feasibility score Data availability Conceptual clarity Data consistency Recommendation Graduation Rate A b Time to Degree B b Relative Rate of Graduate (Un) employment

For those institutions that did provide data on the breakdown, a number of institutions indicated that the estimates were rather crude.

the indicators that have been built using the information from departmental questionnaires and the indicators related to student satisfaction data. 123 Table 6-2:

rating Feasibility score Data availability Conceptual clarity Data consistency Recommendation Student/staff ratio A a Graduation rate A b Qualification of academic staff

In addition, both institutional and national data, to which some institutions could refer, use different time periods in measuring employment status (e g. six, 12 or 18 months after graduation).

comparability of data is hampered seriously by different time periods. In accordance with the institutional ranking the indicator was regarded

The indicator‘inclusion of work experience'is a composite indicator using a number of data elements (e g. internships, teachers'professional experience outside HE) on employability issues;

if one of the data elements is missing, the score for the indicator cannot be calculated. 124 Table 6-3:

rating Feasibility score Data availability Conceptual clarity Data consistency Recommendation Organization of programme A a Inclusion of work experience A a Evaluation of teaching A a

Feasibility score Data availability Conceptual clarity Data consistency Recommendation Percentage of expenditure on research A b Field-normalized citation rate*A a Post-docs per fte

and prizes won B c Out Highly cited research publications*B A Interdisciplinary research activities B A*Data source:

The comments on the‘post-doc'positions mainly regarded the clarity of definition and the lack of proper data.

The large number of missing data and comments regarding the art-related output was no surprise.

Stakeholders, in particular representatives of art schools, stressed the relevance of this indicator despite the poor data situation.

efforts should be made to enhance the data situation on cultural research outputs of higher education institutions. This cannot be done by 126 producers of rankings alone;

initiatives should also come from providers of (bibliometric) databases as well as stakeholder associations in the sector.

Feasibility score Data availability Conceptual clarity Data consistency Recommendation External research income A a Total publication output*A a Student satisfaction:

*Data source: bibliometric analysis Observations from the pilot test: On the field level, the proposed indicators do not encounter any major feasibility problems.

In general, the data delivered by faculties/departments revealed some problems in clarity of definition of staff data.

Here a clearer yet concise explanation (including an example) should be used in future data collection.

The data on post-doc decisions proved to be more problematic in business studies than in engineering.

Robustness Availability Preliminary rating Feasibility score Data availability Conceptual clarity Data consistency Recommendation Percentage of income from third party funding A c In Incentives for knowledge

B b Technology transfer office staff per fte academic staff B b Co-patenting**B A*Data source:

making it difficult to compare the data. 128 Table 6-7: Field-based ranking indicators:

Robustness Availability Preliminary rating Feasibility score Data availability Conceptual clarity Data consistency Recommendation University-industry joint research publications*A a Academic

Out Number of licensing agreements B c Out*Data source: bibliometric analysis;****patent analysis Observations from the pilot test:

The only indicator with an‘A'-rating indicating a high degree of feasibility comes from bibliometric analysis. Availability of data on‘joint research contracts with private sector'is a major problem,

The indicators based on data from patent databases are feasible only for institutional ranking due to discrepancies in the definition and delineation of fields in the databases.

Only a small number of institutions could deliver data on licensing. There was an agreement among stakeholders

Robustness Availability Preliminary rating Feasibility score Data availability Conceptual clarity Data consistency Recommendation Percentage of programs in foreign language A a International joint research

-seeking students New indicator B Percentage students coming in on exchanges New indicator A Percentage students sent out on exchanges New indicator A*Data source:

Robustness Availability Preliminary rating Feasibility score Data availability Conceptual clarity Data consistency Recommendation Percentage of international students A a Incoming and outgoing students

*B A International research grants B b International doctorate graduation rate B A*Data source: Bibliometric analysis Observations from the pilot test:

Not all institutions have clear data on outgoing students. In some cases only those students participating in institutional or broader formal programs (e g.

Availability of data was relatively low regarding the student satisfaction indicator as only a few students had participated already in a stay abroad

The indicator‘international orientation of programs'is a composite indicator referring to several data elements;

feasibility is limited by missing cases for some of the data elements. Some institutions could not identify external research funds from international funding organizations.

Robustness Availability Preliminary rating Feasibility score Data availability Conceptual clarity Data consistency Recommendation Percentage of income from regional sources A c In Percentage of graduates

working in the region B c In Research contracts with regional partners B b Regional joint research publications*B A Percentage of students in internships in local enterprises B c In*Data source:

the low level of data consistency showed that there is a wide variety of region definitions used by institutions,

Both in institutional and in field-based data collection information on regional labor market entry of graduates could not be delivered by most institutions.

validity Robustness Availability Preliminary rating Feasibility score Data availability Conceptual clarity Data consistency Recommendation Graduates working in the region B c In Regional participation in continuing

Out Regional joint research publications*New indicator A*Data source: bibliometric analysis 133 Observations from the pilot test:

Less than half of the pilot institutions could deliver data on regional participation in continuing education programs (and only one fifth in mechanical engineering programs.

there is probably no way to improve the data situation in the short term. While far from good, the data situation on student internships in local enterprises and degree theses in cooperation with local enterprises turned out to be less problematic in business studies than that found in the engineering field.

Both internships and degree theses enable the expertise and knowledge of local higher education institutions to be utilized in a regional context, in particular in small-and medium-sized enterprises.

and in many non-metropolitan regions they play an important role in the recruitment of higher education graduates. 6. 3 Feasibility of data collection As explained in section 5. 3 data collection during the pilot

study was carried out via self-reporting from the institutions and analysis of international bibliometric and patent databases.

Self-reported institutional data 6. 3. 1for the collection of self-reported institutional data we made use of several questionnaires:

the U-map questionnaire to identify institutional profiles the U multirank institutional questionnaire the U multirank field-based questionnaire We supported this data collection with extensive data cleaning processes,

in order to further assess the feasibility of the data collection. In general the organization and procedures of the self-reported institutional data collection were evaluated as largely positive or at least‘neutral'by the institutions.

Very few institutions were dissatisfied really with the processes. The collection of data by online questionnaires worked well

and the coordination of all data collection via a central contact person in participating institutions also proved successful. 134 We made the following key observations regarding the process of collecting self-reported institutional data:

The parallel institutional data collection for U-Map and U multirank caused some confusion. Although a tool was implemented to pre-fill data from U-Map into U multirank,

some confusion remained concerning the link between the two instruments. In order to test some varieties, institutional and field-based questionnaires were implemented with different features (e g. definition of international staff).

This procedure helped us to judge the relative feasibility of concepts and procedures. The glossary of indicators and data elements proved helpful in achieving a high degree of consistency in the data delivered by the institutions.

Yet the definitions and explanations of some elements (e g. staff categories including fte, the delineation of regions) could be improved bearing in mind that there is an apparent trade-off between adequate explanation

while supplying their data. The effort to include a feedback cycle both in institutional and field-based data collection (with questions

and comments on the data already submitted was appreciated greatly by the institutions. Although it implied a major investment of time by the project team,

this procedure proved to be very efficient and helped significantly to increase the quality and consistency of the data.

In some countries the U multirank student survey conflicted with existing national surveys, which in some cases are highly relevant for institutions.

Our major conclusion regarding the feasibility of the self-reported institutional data is that data availability is an issue in a number of cases.

but with the administrative processes related to data collection in some institutions. It may be assumed that when institutions increase their efforts regarding data collection

and data quality this problem will be mitigated. 135 Student survey data 6. 3. 2one of the major challenges regarding the feasibility of our global student survey is

whether the subjective evaluation of their own institution by students can be compared globally or whether there are differences in the levels of expectations or respondent behavior.

and thus that the feasibility of the data collection through a global-level student survey is sufficiently feasible.

Bibliometric and patent data 6. 3. 3the collection of bibliometric and patent data turned out to be largely unproblematic.

in bibliometric analysis the sets of publications produced by a specific institution (or a subunit of it) have to be identified in international bibliographic databases.

which the institution is detected automatically by lexical queries on the author's affiliation field (the address field) of the publications in the databases, by a query on keywords.

completeness of the selected bibliometric data cannot be guaranteed fully. To assess the feasibility of our bibliometric data collection we studied the potential effects of a bottom-up verification process via a special case study of six French universities.

The aim of the case study was to shed light on how a bottom-up verification approach might collect relevant data that would

otherwise be missed. The case study showed that in some cases a substantial number of the publications might have been missed

Nevertheless, the feasibility of the bibliometric data collection in the pilot study can be judged to be high.

Data were identified easily and analyzed although a warning against placing too much dependence on the completeness of the data remains in place.

With respect to the collection of patent data (via PATSTAT) there are two important caveats. First, as mentioned before,

we were only able to identify our sample institutions in the database. Subunits for field analyses could not be found.

This implies that patents for which the intellectual property rights are assigned to companies, governmental funding agencies or individual scientists are retrieved not

A second important caveat when extracting institutional-level patent data is that organizations register their patents under many different names and spelling variations.

PATSTAT data are no exception: applicant names are misspelled often, and their spelling varies from one patent to the other.

) Also with respect to the collection of patent data the conclusion regarding its feasibility is positive. However, it should be noted that patent data analysis could only be undertaken at the institutional

is it possible to extend U multirank to a comprehensive global coverage and how easy would it be to add additional fields?

The pilot study suggests that a global multidimensional ranking is unlikely to prove feasible in the sense of achieving extensive coverage levels across the globe in the short term.

The prospects for widespread European coverage are encouraging. A substantial number of institutions both from EU and non-EU European countries participated in the projects.

but at the end of the day did not submit data suggests that data (non-)availability was a common theme.

in particular in those areas and fields which so far have largely been neglected in international rankings due to the lack of adequate data

including provision for guiding users through the data and a visual framework to display the result data.

In U multirank the presentation of data allows for both: a comparative overview on indicators across institutions,

and, a detailed view of institutional profiles. The ideas presented below are inspired mainly by the U-Map visualisations and the presentation of results in the CHE ranking.

U multirank produces indicators and results on different levels of aggregation leading to a hierarchical data model:

Data at the level of institutions (results of focused institutional rankings) Data at the level of departments (results of field-based rankings) Data at the level of programs (results of field-based

rankings) The presentation format for ranking results should be consistent across the three levels while still accommodating the particular data structures on those levels.

either based on genuine data on higher education systems, e g. the University Systems Ranking published by the Lisbon Council31,

or by simply aggregating institutional data to the system level (e g. the QS National System Strength Ranking.

the definition of the indicators, processes of data collection and discussion on modes of presentation have been based on intensive stakeholder consultation.

In accordance with EU policies on eaccessiblity32 barriers to access to the U multirank results and data will be removed as much as possible.

However, translation of the web tool and the underlying data is a substantial cost factor.

grouping approach), a description of underlying data sources (e g. self-reported institutional data, surveys, bibliometric data,

patent data) and a clear definition and explanation of indicators (including an explanation of their relevance and

The link between the two projects has been created by guaranteeing the use of U-Map data for the selection of comparable (and therefore‘rankable')institutions. 8. 2 Scope:

We would argue that U multirank should aim to achieve a relatively wide coverage of European higher education institutions as quickly as possible during the next project Phase in Europe the feasibility

When this strategy leads to a substantial database within the next two years recruitment could be reinforced, at

The frequency of data collection is always a compromise between obtaining the most up to date information

and the workload that data-gathering imposes on the institutions. For the institutional ranking data collection would probably take place via a full update for instance every two or three years.

We suggest a rolling system for the field-based ranking. There is no definitive answer to the question of how many fields there are in international higher education.

If the rankings were updated on a three-year rolling schedule this would allow coverage of 15 fields.

At that stage a better informed decision about the feasibility of extending the coverage of the rankings to further fields could be taken.

This implies that data updates would not lead to the publication of a 33 The Frascati manual has a similar structure

static ranking but would only feed into the database, allowing the user to rank on the basis of the most current information.

with U multirank it is also possible to create so-called‘authoritative'ranking lists from the database.

For instance, an international public organization might be interested in using the database to promote a ranking of the international, research-intensive universities in order to compare a sample of comparable universities worldwide.

this might be an important means of generating revenue from database-derived products. On the other hand, in the first phase of implementation, U multirank should be perceived by all potential users as relevant for their individual needs.

The quality of the results of any transparency tool depends to a large extent on the availability of relevant and appropriate data.

There is a strong need for a European data system, with institution and field data, preferably with clear relationships to other data systems in the world (such as IPEDS.

and should continue the data collection in a follow-up of the recently finalized EUMIDA project.

The development of the European database resulting from EUMIDA should take into account the basic data needs of U multirank.

This would allow the pre-filling of institutional questionnaires with available data and would substantially reduce the workload for the institutions.

Some specific recommendations regarding the further development of the EUMIDA database can be made: First, there are some elements

such as staff data (the proper and unified definition of full-time equivalents and the specification of staff categories such as‘professor'is an important issue for the comparability of data),

or data related to students and graduates. EUMIDA could contribute to improve the data situation regarding employment-oriented outcome indicators.

An open question is how far EUMIDA is able to go into field-specific data; for the moment pre-filling from this source seems to be more realistic for the institution-level data than for field-based data.

A second aspect of integrated international data systems is the link between U multirank and national ranking systems.

U multirank implies a need for an international database of ranking data consisting of indicators which could be used as a flexible online tool

in order to create personalized rankings by users (according to the user's preferences). This database is a crucial starting point to identify

and rank comparable universities. Developing a European data system and connecting it to similar systems worldwide will strongly increase the potential for multidimensional global mapping and ranking.

Despite this clear need for cross-national/European/global data there will be a continued demand for information about national/regional higher education systems, in particular with regard to undergraduate higher education.

Furthermore it could also be used as a base for an international database and international rankings;

thus creating an increasing set of data systems to be combined into a joint database. How to deal with the top-down and bottom up-approach?

In the‘bottom-up'approach national rankings could feed their data into the international database, the U multirank unit will be able to pre-fill the data collection instruments

and has to fill the gaps to attain European or worldwide coverage. At the same time activities based on the top-down approach might help to make the system known

and to develop trust and credibility. Top-down rankings would also become less expensive to implement

if they could use existing national data and data collection infrastructures. Also, gaining sponsorship for the system could sometimes be easier starting from the national level;

Finalisation of the various U multirank instruments 1. Full development of the database and web tool.

populated with data and tested, and has to start running. The prototypes of the instrument will demonstrate the outcomes and benefits of U multirank. 2. Setting of standards and norms and further development of underdeveloped dimensions and indicators.

definitions of data concepts should be fixed, standardized elements of data collection tools should be developed. In the feasibility study we found indicators

and dimensions where the data collection was difficult, but they have high relevance and we discovered sufficient potential to develop adequate concepts and data collection methods.

These parts of the ranking model should be developed further. 3. Update of data collection tools/questionnaires according to the revision and further development of indicators and the experiences from the U multirank project.

Depending on the further development of indicators and their operationalization, the data collection instruments have to be adapted.

A major issue is to design the questionnaires in a way that reduces administrative burden for the institutions as far as possible.

Development of pre-filling in EU+countries 4. Further development of pre-filling. In the first round of U multirank pre-filling proved difficult.

and the international U multirank database should be realized. Roll out of U multirank across EU+countries 5. Invitation of EU+higher education institutions and data collection.

Within the next two years all identifiable European higher education institutions should be invited to 159 participate in the institutional as well as in the three selected field-based rankings.

The objective would be to achieve full coverage of institutional profiles and have a sufficient number of comparable institutions.

This could be guaranteed by the smoothness of data collection and the services delivered to participants in the ranking process.

Elements of a new project phase Work package Products Deadline Database and web tool Functioning database Functioning web tool prototype 06/2012 Standards

and norms Description of standards and norms Final data model 06/2012 Finalized collection tools Collection tools 06/2012 Pre-filling Planning paper on pre

Data collection Data analysis and publication 06/2012 09/2012 03/2013 06/2013 Specific focused rankings Two rankings conceptualized One benchmarking exercise 12/2013 12/2012

Therefore, efficiency in data collection is important. This criterion also refers to an efficient link between national rankings and the international ranking tool.

In addition, efficiency refers to the coordination of different European initiatives to create international databases (such as E3m, EUMIDA.

when the organizational structures do not lead to data monopolies. Service orientation: A key element of U multirank is the flexible, stakeholder-oriented, user-driven approach.

e g. media companies (interested in publishing rankings), consulting companies in the higher education context and data providers (such as the producers of bibliometric databases).

because if HEI experience high workloads with data collection they expect free products in return and are not willing to pay for basic data analysis.

The cost factors are first of all related to the necessary activities involved in the production of ranking data:

Methodological development and updates Communication activities Implementation of (technical) infrastructure Development of a database Provision of tools for data collection Data collection (again including communication) 170 Data analysis (including self-collected

data as well as analysis based on existing data sets as e g. bibliometric analysis) Data publication (including development and maintenance of an interactive web tool) Basic information services for users Internal

This determines the volume of data that has to be processed and the communication efforts. The number of countries/institutions which deliver data for free through a bottom-up system (this avoids costs.

The number of fields involved. To limit cost a ranking could not cover all fields with sufficient size

The surveys that are needed to cover all indicators outlined in the data models of U multirank.

and graduate surveys or the use of databases charged with license fees, e g. bibliometric and patent data.

The frequency of the updating of ranking data. A multidimensional ranking with data from the institutions will not be updated every year;

the best timespan for rankings has to take into account the trade-off between obtaining up to date information

and the workload for the institutions and the costs of updating data for the operative unit. 171 For the different steps we could identify the relevant cost factors, some fixed, some variable:

IT Indicators/databases used (e g. license costs) Development of a database Staff Basic IT costs Provision of tools for data collection Staff Basic IT costs (incl. online survey systems

and databases Data analysis Staff Number of countries and institutions covered Range of indicators and databases License fees of databases (e g. bibliometric) Publication Staff Basic IT costs Features of web tool

to present results Information services for users Staff Basic IT costs Number of countries and institutions covered Range of indicators and databases Scope of information services Internal organization

2) two junior staff members with experiences in statistics, empirical research, large-scale data collection, IT;(3) secretarial support.

To ensure students'free access to U multirank data the EC could provide also in the long run direct funding of user charges that would

such as special analyses of ranking data, to cross-subsidize the instruments. g) Financial contributions from media partners publishing the results. h) Nonfinancial contributions from third parties,

such as free data provision. i) Free provision of data from national mapping and ranking systems (bottom-up approach).

and who will pay the variable costs for ranking/data collection. The scenarios try to develop a medium-term perspective.

Funding scenario 1 COST FACTOR COST SHARING Basic fixed costs 100%principals Rankings 50%principal 50%media partner, data providers

or data providers who benefit from being positioned in the ranking field. The principal's funding share could also include a contribution from the EC

%media partner, data providers, publishing companies 30%selling of products, user charges The different scenarios could be seen as extreme cases, each of them focusing strongly on one or some of the potential funding sources.

Free data provision, especially from nationally financed and run projects or from other existing data sources, will lower the cost of data collection.

The more that national statistics offices harmonize data collection, the lower the costs will be. Charges to the users of the U multirank web tool would seriously undermine the aim of creating more transparency in European higher education by excluding students for example;

but there is a possibility of some cross-subsidization from selling more sophisticated products such as data support to institutional benchmarking processes, special information services for employers, etc.

The EC could pay for student user charges. Project-based funding for special projects, for instance new methodological developments or rankings of a particular‘type'of institution offer an interesting possibility with chances of cross-subsidization.

What kind of benefit do institutions have to have in order for it to outweigh the costs of data gathering plus subscription fees?

EC, foundations, other sponsors) with a combination of a variety of market sources contributing cost coverage plus some cost reductions through efficiency gains. 8. 9 A concluding perspective U multirank

and should enlarge this database internationally, targeting the institutions required to reach sufficient coverage for all relevant profiles.

The nature of the ranking has to remain global and should not merely serve European interests.

Data from U-Multirank, U-Map, national field-based rankings and national statistics should be integrated coordinated

an exploration of Italian patent data''.''Research Policy, 33: 127 145. Becher, T. and M. Kogan (1992.

New evidence from the KEINS database''.''Research Evaluation, 17 (2): 87-102. Magerman T, Grouwels J.,Song X. & Van Looy B. 2009.

Data Production Methods for Harmonized Patent Indicators: Patentee Name Harmonization. EUROSTAT Working Paper and Studies, Luxembourg.

‘‘What patent data reveal about universities: The case of Belgium''.''Journal of Technology Transfer, 28:47 51.

performance measurements and indicators based on co-authorship data for the world's largest universities, Research Evaluation, 18, pp. 13-24.

< Back - Next >