data

Data

Clustering (43)
Data (486)
Data analysis (6)
Data gathering (7)
Data mining (18)
Database (127)
Network analysis (12)
Qualitative data (8)
Quantitative data (13)
Text mining (31)

Synopsis: Data:

ART1.pdf

The main proposal in terms of process advanced the view that full use should be made of ICT in enabling data collection

There was no discussion of data based systems only judgement based systems. A wide range of F. Scapolo/Technological forecasting & Social Change 72 (2005) 1059 1063 1060 techniques and tools were used in complex combinations

and moulding expert opinions into good conclusions remains an elusive goal. 4. Tales from the frontier The contributions to this session had a fairly common theme in that they focussed on the establishment of databases and the associated data collection,

and technology publication and patent abstract databases to better inform technology management. This paper describes, through a case study on solid oxide fuel cells, the value of quick text mining profiles of emerging technologies.

One of the main advantages of this technique (i e. QTIP-Quick Technology intelligence Processes) is that it allows the conducting of a certain technology analysis within only a few days

instant database access, analytical software, automated routines, and decision process standardization. The paper discusses the importance of process management for dtech miningt,

ART10.pdf

Scenario skeletons can then be derived by clustering such storylines based on consistency the so-called inductive approach. The deductive alternative is to analyse systematically the scenario space spanned by the most prioritised uncertain drivers.

ART11.pdf

the Project Team analyzed issues based on the assessment data. For each issue, key statistics were calculated (e g.,

ART12.pdf

and monitoring their performance (i e. data gathering and reporting strategies) and practices to review existing regulations.

The analysis of regulatory foresight in the narrow sense is based, first, on a broad survey of literature databases and the internet regarding regulatory impact assessments in general,

matching policy instruments and methodologies Innovation surveys Econometric models Control group approaches Cost benefit analysis Expert panels/peer review Field/case studies Network analysis Foresight/Technology assessment

Threats to health, safety and the environment can be identified by searches both in the patent data for 501 K. Blind/Technological forecasting

& Social Change 75 (2008) 496 516 related patent applications and in literature databases for articles addressing the various risk aspects.

Blind 25 shows, based on international and inter-sectoral cross-section data, that the output of formal standardisation bodies can be explained significantly by the patent applications as a reliable indicator for the dynamics in the respective technologies.

Some studies based on OECD data and other internationally comparable data investigated the influence of the regulatory framework on R&d activities 27 or product innovation 28.

Science and technology indicators are easily available in publicly provided or commercially distributed databases. However, the methodological challenge is to meet the adequate level of specification and differentiation of the technology indicators,

Data requirements/indicators: The simple quantitative use of science and technology indicators in order to detect future challenges for the regulatory framework is not sufficient.

and regions Collection of survey and preparation of data set; Definition of goal variables of the organisation depending on the possible requirements for regulations and standards;

Statistical or econometric data analysis and interpretation of results. 3. 2. 2. Examples Although we cannot refer to a large number of regulation-and standard-related surveys,

whose data permits the assessment of the future needs for and impacts of regulations and standards.

who used the data to analyse the interrelationship between standardisation, research and export activities, taking subjective attitudes into account.

which reveals that standards for safety aspects, data security, data formats and customer interaction are most important for the surveyed German service companies.

furthermore standards to improve data security and information systems in general. In addition the impacts of standards on central issues and assets of service companies have also been asked for

since they require the development of a questionnaire, the performance of a survey either via traditional postal mail or via online survey, the collection and cleaning of the data and finally, the analysis of the data.

Data requirements/indicators: The main advantage of surveys is that they allow the consideration of very specific regulatory challenges in the future,

Hence, they are able to provide unique data in this respect. Depending on the size of these surveys,

and lead to representative results, the data can be combined with indicator-based approaches representing the universe in science and technology.

and automatically checking the content of image data unsuitable for children which are available over networks.

Peta bps per optical fibre. 2011 4. 20 4. 29 2. 39 2. 14 3. 61 Widespread use of a SCM (supply chain management system to handle data

. 40 Practical use of systems capable of understanding and automatically checking the content of image data unsuitable for children

which confirms the positive linkage found in historical data 25, whereas the statistical connection between R&d support and regulation is rather vague. 3. 3. 4. General assessment In general,

Data requirements/indicators: The application of the Delphi method to the issue of regulations and standards requires the development of questionnaires,

However, the databases provide further information about 512 K. Blind/Technological forecasting & Social Change 75 (2008) 496 516 regulation-relevant contents, like health,

and assessment of regulatory foresight methodologies Methodology Type Data requirements Strengths Limitations Indicators Quantitative also providing qualitative information Adequate science

and technology indicators combined with qualitative data Systematic approach Only quantitative data is not sufficient to detect emerging fields of regulation Comparison across technologies,

and even stakeholders Influence of non-technology-related factors cannot be considered Surveys Quantitative Micro data of the respondent

the universe Processing and analysis of data requires large human resources Identification of adequate samples Some types of information are difficult to obtain (answers to counterfactual questions

and semiquantiitativ data from Delphi surveys Consensus-building to reduce uncertainty about regulatory priorities and impacts Impossibility to detect major technological breakthroughs and their regulatory requirements Semiquantiitativ In case of conflicting interests, missing-consensus about priorities Identification of experts Uncertainty increases with complexity of the context (technology, markets

ART13.pdf

both the integration of multiple functions and automated analysis and data handling remain to be accomplished in a selfconttaine cell-on-a-chip.

Manag. 18 (2001) 39 50.10 R. N. Kostoff, E. Geisler, Strategic management and Implementation of Textual Data mining in Government Organizations, Technol.

ART15.pdf

Second, both employment and financial data, that is, spending on R&d activities by research performing sectors, suggest a great diversity in terms of the‘weight'of these sectors.

38, p. 65). 20 Space limits prevent presenting data here; an extensive statistical annex can be found in the original report for the DG Research, EC, on which this article draws.

This section, in turn, relies on OECD data, published in 39. A detailed analysis of some recent trends in universities'research activities can be found in 5. 21 Other key trends

Data also indicate that universities not only conduct basic research and it is not only universities who conduct basic research (on average,

ART17.pdf

Swanson explored knowledge discovery by exploring database links 10. Swanson demonstrates integrative capability by demonstrating new links between technologies, inherent in the data,

which were not readily apparent to the respective scientific communities. In this paper we examine techniques for exploring emergent structures or architectures of technology.

the Internet, science and technology databases, patent databases, newswires, and potentially also newsgroups or other online collaborative environments.

Thus, there is a rich basis of theoretical support for structuring technological component data in a hierarchical format.

The network data in the raw is not useful for this purpose. A structured representation of technology is needed for multiple reasons:

Without a theory of the data the technology analyst cannot distinguish between meaningful structure and possibly accidental corruption of the knowledge base.

Therefore, without a generative model of the data, the interpretation of the data may not be robust.

A structured representation of the data provides a principled account of where technological change is most likely to occur. 1139 S w. Cunningham/Technological forecasting & Social Change 76 (2009) 1138 1149 The article

This knowledge is stored in databases of science and technology. New future-oriented technology analysis techniques, such as the approach suggested here,

Such organizations, not surprisingly, need access to these distributed databases of knowledge. Unlike conventional, disciplinary researchers, these organizations do need not necessarily the database to gain access to individual pieces of information

(whether it this be press releases, patents, or research articles). Rather, the participating organizations use the database as a coordination mechanism,

enabling them to rapidly respond to the research and development efforts of their peers. Thus, they also require analytic support to gather, structure,

and comparing and analyzing these results given observed data. Real world evidence is used to tune and parameterize the specific model representations used.

Parent nodes are not directly observable in the data. Children nodes can be observed directly in the data,

and are given the corresponding labels. 3. 1. Example of hierarchical random graph An example of a hierarchical random graph is presented below.

which cannot be observed directly in the data, represent morphological principles actively at work in structuring the data.

In the example given below there is a 70%chance that nodes C and D are linked with nodes A and B;

and how concerned we are with a robust representation of the data in the presence of noise. 1140 S w. Cunningham/Technological forecasting

& Social Change 76 (2009) 1138 1149 The hierarchical representation of the data grows more attractive as the network grows larger,

The middle network shows clear clustering and yet is still a flat network. Members of the three groups in this network are likely to connect to each other,

Thus, the hierarchical random graph is a very expressive formalism capable of capturing many possible network relationships. 3. 3. Fitting graphs to data For each of these structures we must also estimate the associated probabilities of network linkages

The best fit is achieved by fitting probabilities according to the actual proportion of linkages observed in the data.

This is the maximum likelihood estimate of the model parameters given the data 21. With only fifteen possibilities, we can exhaustively search the space of possible network structures.

Every possible network consistent with the data can be enumerated, and the likelihood of each network model given the data can be calculated.

The analyst can then choose the network or networks which provide the best fit to the data.

Larger networks prevent this exhaustive search process. Nonetheless, a systematic technique for searching through the space of models is still necessary.

Gill 25 provides a relevant and comprehensive account of Monte carlo simulation for data analysis. Fig. 2. Two realizations of the example hierarchical random graph.

include grid computing, the ipod and iphone, virtualization and LAMPP. 4. 1. Data collection and comparative analysis For the case study we collect data about Ajax and component technologies from the Internet.

The second is the clustering coefficient, which is a measure of excess links between closely related nodes.

even if a strict structuralist account of the data is adopted not. A graphical presentation of a subset of the Wikipedia network near AJAX is given.

which can be interpreted only in light of a more elaborate model of the data. A concluding section of the paper reflects upon the sociology of science,

in an effort to confront the observed data with sociological theory. In short, some descriptive statistics of the network are provided

despite the fact that the author does not endorse a structuralist account of the data. The network grows rapidly in size.

Apparent even in this graph representation of the Wikipedia web is the obvious structuring of the web pages through clustering and local hierarchies.

of hierarchical analysis of data 21. These include a metabolic network, a grassland ecology, and a social network of terrorists.

Key attributes of the network include average degree k, average clustering coefficient (C), and average network diameter (d). It is interesting to compare this Wikipedia sample with these other known networks.

Fitting the data The component technologies of Ajax may be represented in hierarchical random graph form. We apply the Monte carlo simulation procedure of Clauset 21 to fit the 41 pages within one hop of Ajax (Programming) into a hierarchical random graph.

data. 1144 S w. Cunningham/Technological forecasting & Social Change 76 (2009) 1138 1149 The resultant hierarchical random graph usefully distinguishes between high-level concepts

while external technologies do not reveal much hierarchical structure, at least in this sample of the data. One challenge to classification revealed by Figs. 5 and 6 is the placement of the various web browsers.

Claim Claimant Data Scientific and technical knowledge consists of a set of interdependent claims Popper 31 Networks of knowledge can be structured readily from science

Furthermore, the structured representation of the data may help identify areas where competences may need to be strengthened further or even completely restored.

and case studies. 6. Interpretations from the philosophy and sociology of science The hierarchical random graph is one possible model of science, technology and innovation data.

The alternative approach would be to expressly encode the configuration within the database of science and technology.

as well as presaging a significant reorganization of the science and technology database to better match technological progress.

Use of alternative databases, such as patenting data, would be an interesting item for extending the method.

Acknowledgements The author gratefully acknowledges the use of C++ code and Matlab scripts as provided by Aaron Clauset on his webpage (Clauset 2008).

He has worked as a data miner for large database companies, developed patents in the fields of pricing and promotion algorithms, been a research fellow at the Technology policy Assessment Center of Georgia Tech,

ART18.pdf

we have developed two graphical representations of the assessment data. The first one relates social preference for each option in each scenario to its potential social conflict level (see Fig. 2). For simplicity's sake,

However, these aspects can be taken into account in the interpretation of the specific empirical data.

we introduce a second visual representation of the data (see Fig. 3). As a first dimension,

uncertainties, trade-offs and decision making The data generated in the workshops and core team sessions are synthesized finally by the core team into a recommendation for strategic planning.

ART19.pdf

and interpreting existing data, information and expert opinions. Creating shared understandings among the stakeholders about the possible future developments is also important in each field;

o data on the system being analysed and on all the associated substances, o operational model of the system under analysis, o systematic hazard identification procedure and risk estimation techniques,

A systematic risk analysis typically starts, after the data gathering, with the identification of hazards and the associated hazardous scenarios according to a specific procedure defined by the selected risk analysis method.

Relevant probability data is seldom available and as such, fully quantitative risk estimations are performed not normally in industry.

ART2.pdf

g) The potential offered by new sources of social data. D 2005 American Council for the United nations University.

There are few attempts to aggregate futures data and build current work on proven prior work. The result, for better or worse, is that the field lacks the consistency and coherence that mark more scientific fields.

through networks, with diverse and changing sets of people, continually cross-referencing data, and monitoring decisions.

the possible use of behavioral data from which values may be inferred, the use of large numbers of computer generated scenarios to optimize policy choices 2,

First, analysts should recognize that random appearing data and bizarre behavior may not be what they seem.

In the old days validity was tested by building models with data through some date in the past

8. New sources of social data As large scale data bases become available in the future it will be possible to perform cluster analyses

These data will also be a stimulant to the search for correlates: what kind of behavior, for example, leads to propensity to particular diseases.

ART20.pdf

This attainment raised national interest and critical debate of the reliability of the data basis and methodologies used in comparisons.

The criticism is related to the ways data and methodologies are used in comparisons. For example, one problem of comparisons based on composite indicators is that they give a backward looking mirror perspective,

i e. they are based only on past and often outdated data, and not on examination of future development.

Consequently the barometer gives both a compilation of ex-post data and strategic perspectives on howwell the Finnish innovation environment is positioned now

The purpose of a technology barometer is to give data of how favorable and competitive the Finnish innovation environment is assessed to be now and in the future.

The data used by the barometer illustrate transitional phases and provide an overall image of how far the developed nations have come in a journey towards a knowledge-value society.

albeit the most important data is related to outcomes and impacts of inputs, like embedding of ICT into private

Developments which have taken already place are depicted in one element based on statistical data. The indicator-based data can be used for the generation of index figures to display the nations'techno-scientific base and level of societal development in comparison with the reference group.

The reference group used in the first three implementation rounds consisted of Denmark, Finland, Germany, Japan, The netherlands, Sweden,

and the barometer publications consist of a lot of complementary and comparative data and analysis of considered indicators.

The processes for analyzing the collected data and synthesizing it into meaningful conclusions remain among the key tasks in technology barometer exercises.

calls for a high transparency of the methods used as well as transparency of all the utilized data. Transparency is of paramount importance for retaining the attention of the target groups

Implementing change and guiding desired actions through the decision-making chain requires sound analysis based on quantifiable data that is presented in an understandable format.

Recent relatively radical changes of Finnish innovation policy are challenging data basis and indicators of research and innovation,

and needs new data and novel indicators to be included in the barometer. In Finland, the sectoral research system of government administrations will be renewed,

JRC (2002) and compilation by OECD. Area/name of composite indicator Economy Composite of Leading indicators (OECD) OECD International Regulation Database (OECD) Economic Freedom of the World

ART21.pdf

research in databases and the internet. This search was combined with a bibliometric approach. Literature was analysed.

internally assessed and reassessed several times via an internal database and scientific papers. As an input to the first workshop in November 2007, a first set of scientific papers describing the developments in the fields were written

Basis of the choice of participants were public databases and the results from bibliometrics. A reminder was sent at the end of September 2008

relatively simple charts (mainly in percent) had to be used in order not to over-interpret the database.

ART22.pdf

which would start a new cycle. 3 The European Environment Agency is a specialised agency of the European union with the prime task of providing targeted, timely, relevant and reliable data and information on the state and prospects of Europe's environment.

There are also data available on the types of businesses that use scenarios most often large firms in capital-intensive industries with long (greater than 10 years) planning horizons.

This has been confirmed by studies that gather data on individual participants in a scenario planning project 20.

ART23.pdf

the resulting database of aggregate opinion ought to be open to wide access without infringing or diminishing civil and personal liberty. 3. Measurement of an improvement in participation in Foresight should be simple

ART24.pdf

This framework which can help in structuring large amounts of heterogeneous data, aid the construction of complexity scenarios,

a firm developing food-packaging sensors uses the blog to collect data on user preferences allowing targeting strategies.

but confidentiality of development hampers transparency (issues of competition) and thus watchdogs find it difficult to access data to assess practices.

ART27.pdf

international benchmarking data, and future-oriented‘intelligence'),'the organisation of dialogic spaces that are hijacked not solely'by special interests

the collection of statistical data and bibliometric data; and a series of face-to-face interviews with stakeholders, including senior researchers within Luxembourg and abroad,

Setting Context/Identifying Priorities Data collection Bibliometrics Interviews International research trends Evaluation of FNR programmes Mapping of Lux.

to avoid shorttermmism to collect necessary background data and to ensure its use in the process;

the availability and use of background data; and the nature of processes of deliberation. 5. 1. Variety and change in the meanings of Foresight The FNR Foresight was born out of the necessity for the FNR to define new research programmes.

The consultants employed to coordinate the exercise in Phase 1 did a sterling job in such a limited time to collect baseline data

and benchmarking data and more time spent on data collection and analysis. Similar shortcomings have also been noted by Meyer 2008 who comments that Luxembourg's‘current science policy appears to be almost too ambitious,..

too impatient in wanting to implement change'.'Everyone (finally) realised that further discussions would be needed with the research communnit before agreement could be reached on priorities

Second, it is clear that a forward-looking process like foresight needs to be underpinned by sufficient and appropriate‘objectivised'data, e g. publication data, statistics on the national R&d environment, reports on the state of economy, environment or society

as much national data was missing while international benchmarking was limited of use owing to Luxembourg's small size.

A productive use of data requires a thorough scanning of what's available; its analysis and preparation in order to capture its essence;

and its introduction into the foresight process at specifically designed points in order to supply participants with the necessary data as and when required.

ART29.pdf

participation as well as structured/non-structured conversations and interviews are equally important sources of data'(Thygesen 2009,56, n7).

The following data, however, shows that the collection of young people's contributions was preceded by the construction of a specific image of them as stakeholders.

adding survey data and material from other sources. These future pictures were presented then in a workshop with communal and cultural organisations to discuss which of these were most desirable.

and recommendations need to be based upon sound data of the past and present, as well projections of those trends that can be projected with reasonable confidence of accuracy,

ART3.pdf

and technology publication and patent abstract databases to better inform technology management. To do so requires developing templates of innovation indicators to answer standard questions.

Text mining; Knowledge discovery in databases 1. Introduction How long does it take to provide a particular Future-oriented technology analysis (FTA?

We traditionally perceived the answer calibrated in months, particularly for empirical technology analyses. This mindset contributes to many technology management

1) instant database access, 2) analytical software, 3) automated routines, and 4) decision process standardization. The first QTIP factor concerns information availability.

Of particular note to FTA, the great science and technology (S&t) databases cover a significant portion of the world's research output.

These databases can be searched from one's computer, enabling retrieval of electronic records in seconds.

Many organizations have unlimited use licenses to particular databases that allow for thousands of records to be located

Various databases compile information on journal and conference papers patents, R&d projects, and so forth. In addition, many researchers share information via the Internet (e g.,

Other databases cover policy, popular press, and business activities. These can be exploited to help understand contextual factors affecting particular technological innovations.

Namely, many aspects of data cleaning, statistical analyses, trend analyses, and information visualization can be done quite briskly.

That implies bworking backq from the decision support requirements to the data. It makes less sense for a bdata miningq mindset in

which we muck around in the data looking for interesting things that might be of interest,

One would adapt these to one's data sources and managerial concerns to posit particular indicators.

We next need to align the other three factors to enable QTIP-database access, analytical tools,

Factor 1 is binstant database accessq. That is, access to the requisite information resources should be direct and seamless.

Often, an information services unit handles all data requests. In the past this typically meant that a researcher

They can negotiate fair licenses that enable desktop access to the most useful and affordable databases.

explaining database nuances, and by reviewing critical searches to suggest refinements. We suggest that information professionals consider expanding their skill sets to become expert on analytical tools

3 Organizations access such databases various ways. For instance, Georgia Tech previously hosted key databases on its own server for access by students, staff, and faculty.

Presently it accesses some databases via a statewiid consortium called Galileo, others directly using passwords through their internet sites,

and some using CD-ROMS. At times the Technology policy and Assessment Center at Georgia Tech has accessed such sources through a gateway service,

Dialog. 3 We accessed data via Dialog, a leading gateway to over 400 different databases.

and Dialog for access to the fuel cell data. A l. Porter/Technological forecasting & Social Change 72 (2005) 1070 1081 1073 Whatever the route

In general, we find that the databases provide much richer S&t information resources with a measure of quality control.

In general, we prefer to first exploit the R&d databases, then update and probe using the internet.

With another firm, we have been exploring text mining tool applications. We mutually recognized that certain preliminary analyses could be done in 3 minutes

, in coordinating licenses and access to databases and analytical tools! technology analysts (e g.,, power users of these capabilities on a regular basis!

, occasional users of the databases and analytical tools! decision-makers (e g.,, policy-makers and managers who weigh emerging technology considerations as either their main focus or as contributing factors,

Well, it turned out that actual data were better. Compiling and making available performance histories for machines

There would be no bsix Sigmaq quality standards without empirical manufacturing process data and analyses thereof.

Technology management, somewhat surprisingly, is among the least data-intensive managerial domains. One would think that scientists, engineers,

business) databases. As such it represents one advanced form of technology monitoring. This information can serve other FTA needs to various degrees:!

which databases contain the most SOFC information. I select two that provide good coverage and are licensed for unlimited use by Georgia Tech.!

A script runs data fusion and duplicate removal. An additional script profiles the leading researchers at each of the btop 3+Georgia Techq American universities in the SOFC domain.

An auxiliary search is run on a U s. Department of energy R&d projects database for these four universities.

Provide each researcher, development engineer, project manager, intellectual property analyst, etc. with direct, desktop access to a couple of most useful S&t information databases.!

Negotiate unlimited use licenses for those databases.!License easy-to-use analytical software for all.!Script the routine analytical processes.!

Exploiting New technologies for Competitive advantage, Wiley, New york, 2005.2 T. Teichert, M.-A. Mittermayer, Text mining for technology monitoring, IEEE IEMC 2002 (2002) 596 601.3 R

L. M. Galitsky, W. M. Pottenger, S. Roy, D. J. Phelps, A survey of emerging trend detection in textual data mining, in:

M. W. Berry (Ed.),Survey of Text mining: Clustering, Classification, and Retrieval, Springer, New york, 2004, pp. 185 224.11 See http://www. kdnuggets. com/.12 C. Chen, Mapping Scientific Frontiers:

< Back - Next >