data

Data

Backup (32)
Big data (1134)
Computer data storage (46)
Data (9169)
Data center (85)
Data compression (5)
Data format (69)
Data mining (249)
Data recovery (6)
Data sharing (56)
Data storage devices (54)
Data stream (17)
Data structure (11)
Data transmission (33)
Data type (13)
Data warehouse (84)
Database (1043)
Digital data (577)
Input data (16)
Satellite data (3)

2015933946 British Library Cataloguing-in-Publication Data A CIP record for this book is available from The british Library ISBN 978-1-84821-556-6 Contents ACKNOWLEDGEMENTS...

smithsonian. com) The satellites and the spread of the Internet and mobile devices, smartphones and tablets have led to a veritable deluge of data, further accelerating the move toward the Internet of things.

mobile, cloud and big data DIG 12. The topics of Digiworld 2013 were connected objects, video as a service, digital malls and digital money, smart city and digital living, future Internet and games.

Big data offers unified access to information. It allows the large-scale dissemination, analysis and use of data for the benefit of consumers and citizens.

Analytics are used mainly to find information in large amounts of data. Other techniques of knowledge discovery, such as neural networks, genetic algorithms, induction or other multistrategy machine learning hybrid tools PIA 91

are available but underused. Due to sensors and embedded software, objects are becoming increasingly interactive. It is possible, for example,

Why do sharks seem drawn to the data cables that rest on the ocean floor? Do they feel attacked?

3 http://www. digiworldsummit. com. 10 The Innovation Biosphere Google has evolved from search engine to many other services related to data collected from users

Their vision, strategy and innovation attitude have been fruitful â€ in possession of a huge amount of data they have become the â€oemaster of the worldâ€ through data, the new capital.

Social networks, especially Facebook, are another contributor to big data. All these data are stored in data centers that must be powered and cooled.

The first European data center of Facebook was established in Luleã¥, Sweden. Figure 1. 3 presents its energetic architecture.

Figure 1. 3. Facebook Luleã¥ data center, Sweden This provides Facebook center local job creation and impacts the regional economy.

Concerning the environmental aspects they mention the Innovation Landscape and Fields 11 availability of cheap,

It lets them remove 70%of the diesel units for backup power compared to the same facility in the USA.

Another ethic of publishing on Facebook may considerably decrease the need for big data. Google are said to use 50%less energy than the typical data center.

Designed to best use the natural environment and conditions, they use outside air and sea water in cold climates (Hamina,

In 2010, they started buying renewable energy from wind farms near their data centers. Google also developed a machine-learning algorithm (artificial intelligence (AI)) that learns from operational data to model plant performance

and predict power usage effectiveness GOO 14. Energy efficiency of data centers and green IT are emerging as some of the most critical environmental challenges to be faced because of the increasing yet unprecedented trend in digitization of business processes,

such as online banking, e-government, e-health and digital entertainment. The worldwide data centers CO2 emissions achieve about half of the overall airlinesâ€ CO2 emission GAM 10.

What is the percentage of useful information in these data centers? How many times the same or similar data including the same pictures are registered in different databases?

If only we could verify before registering and make link instead of multiple storing of the same object,

it will certainly decrease the need for energy, cooling and environmental impact of data centers. 1. 2. 1. Example of applying environmental principles Bull,

a French computer company founded in 1931, applies CSR and sustainable development principles and makes efforts for innovating in computer

and reducing the environmental impact of data centers. 12 The Innovation Biosphere The main challenge for IT companies is eco-efficient IT with economical use of raw materials, low energy consumption and an emphasis on recyclability.

Bull offer includes big data, the cloud, green IT and digital simulation. The latter is very useful for creating a â€oegreenerâ€ innovation â€ it allows us to simulate the potential impact before transformation

Their concern is to increase eco-efficiency of data centers and of control over the environmental impacts, green IT for energy challenges and IT for change in response to sustainability issues.

data center calories are recovered now to heat offices at the site. The new modular outsourcing center opened in 2013 is targeting a high level of energy efficiency.

and allows older equipment in need of spare parts to be kept in service control operations at its data centers

such as high-performance computing (HPC), big data, M2m, cloud, security and mobility, dovetail neatly with the most crucial health issues:

and mass data simulation for medical research. The connected objects may serve to dynamically control energy consumption

increasing this way a need for storage of data. They have to respect a â€oecode of conductâ€.

Sims Recycling Solutions in Eindhoven recycles 60,000 tons of electronic waste annually a third of which is in the form of monitors and televisions,

data, text and image mining tools, creativity â€oeamplifiersâ€, robots and drones, etc. mostly not eco-designed MER 11, MER 13a.

IT limits to data and information processing, whereas AI is about â€oeknowledgeâ€ thinking and problem-solving.

Three-dimensional (3d printers, invented in the 1980s, are able to print a 3d object of almost any shape from a 3d model or other electronic data source primarily through an additive process in

Internet has facilitated the exchange of medical data and experiences. Health care practices are supported now by electronic processes and communication (e-health.

The Internetâ€ s quick access to the patientsâ€ data is useful in an emergency, but it may also be used maliciously.

Researchers point out the absence of data related to the risk of the skin contact or the ingestion of nanoparticles BAT 14.

such as intellectual and social, traditional and digital communication, natural resources and quality of life, and innovates continuously for the sustainable success of all participants.

Smart city is an intensive user of ICT, intelligent technology, big data, connected objects and others.

is in essence a way to use statistical analysis of historical data to transform seemingly non-routine tasks into routine operations that can be computerized.

Big data will be used to give organizations a competitive advantage in terms of marketing and customer relationships; lawyers will be displaced by e-discovery software that can rapidly determine which electronic documents are relevant to court cases;

â€ the valuation of big data. These challenges â€oecan be seen as seven critical pillars to initiate in France the process of long-term prosperity and employmentâ€.

15 are concerned with exploring big data, 13 others address the personalized medicine and 10 are concerned with energy storing.

One participant plans matching public employment data with those from companies to create a decision support system dedicated to both recruiters and jobseekers.

Those related to IT Energy efficiency of Data centers, and Green IT in general, are emerging as some of the most critical environmental challenges to be faced.

and Earth science data recently made available on the Open NASA Earth Exchange (Opennex) platform on Amazon web services (AWS) in new and creative ways;

and research and â 19 billion for the industries of tomorrow, including the support of SMES, technologies for sustainable development and digital economy.

energy storage, recycling of rare metals, exploration of sea resources, vegetable proteins and plant chemistry, personalized medicine, silver economy and longevity and valorization of Big data.

as well as the contribution of open data to the smartness of growing cities. 124 The Innovation Biosphere Numerous initiatives supported by the EU programs are described in Research*eu Results Magazine1.

To deal with information overload, new representations allow people to assimilate data in a simpler way not only through the use of existing visual channels,

Econophysics, a new field of science, tackles the analysis of economic data based on a methodology developed in physics.

and analyzes health data of administrative (billing) and lab results to detect care gaps and neglected patients.

two involving cross-border and cross-sector collaboration, two on user-centered design, one on big data,

Sign bases are made with multimedia technologies (3d, High-definition television (HDTV)), which are the best means for showing examples from teachers

(http://vbrant. eu) supporting the development of virtual research communities involved in biodiversity science for managing biodiversity data on the Web using biologists and computer scientists.

and biometric information registered in a database. UID branded as Aadhaar guarantees only a personâ€ s identity, not rights, benefits or entitlements.

When completed, it will probably be the worldâ€ s largest single-entity biometric database. This initiative is both technological and social.

The other cases described in the Open Innovation Yearbook 2014 are about innovation networks such as Oulu Innovation Alliance, Big data exploration, smart urban lighting and innovative services for lawyers.

What is the backup in the case we lack energy? Social networks connect people that decide common useful actions, 158 The Innovation Biosphere but also influence them.

Each ommatidium is composed of a lens (172 microns), combined with an electronic pixel (30 microns.

Big data is collected from devices; we will be able to create our own apps and smart analytics to use them

â€ big data versus world knowledge base. It is time to switch from â€oequick business, having more and to show what we haveâ€ to an awareness about the beauty

-and-its-impact-society video http://www. itif. org/media/big data-cloud-computing-how-it-creating-new-era-disruptive-innovation#video,

of 14 december 2012, http://ec. europa. eu/research/participants/data/ref/fp7/132141/h-wp-201301 en. pdf. EUR 13 EUROPEAN COMMISSION, Europe

GOO 14 GOOGLE GREEN, Designing efficient data centers, http://www. google. com/green/efficiency/datacenters/,2014.

OEC 05b OECD, Oslo Manual, Guidelines for Collecting and Interpreting Innovation Data, 3rd ed.,2005.

PIA 91 PIATESKI G.,FRAWLEY W.,Knowledge Discovery in Databases, MIT Press, Cambridge, MA, 1991.

data centers, 10â€ 13,93 developement sustainable, 4, 11,37, 44,54, 67,83, 114,117, 156,161, 176,179 digital agenda, 102, 105â€ 107 ecosystems, 49 era

and manual labor (workflow), leading ultimately to cost-resilient processes. vii However, in the age of digitization,

and Richard Welch Part III Driving Innovation Through Advanced Process Analytics Extracting Event Data from Databases to Unleash Process Mining...

such as mobile and real-time technologies, the Internet of things, big data analytics, and social media, have come to the fore in recent years,

New technologies including mobile and real-time technologies the Internet of things, big data analytics, and social media clearly illustrate the enormous impact of IT on society in terms of enabling competitiveness and welfare (vom Brocke, Debortoli, Muâ ller, & Reuter, 2014).

Another characteristic of the digital age relates to the fact that data is not only available anywhere (irrespective of location) but also anytime (irrespective of time.

Particularly, it also relates to the idea of real-time avail-ability of data. The possibility to receive up-to-date information at any point in time is key for essential innovations in many business processes.

Integrating multiple kinds of real-time data, analytics today already enables the prediction of events like the spread of the flu

It is intriguing to think how such data integration will innovate our professional and private lives in the near future,

or to rest for a few minutes based on body data taken from the skin (vom Brocke,

and big data analytics that allow for real-time process decisions based on data available from products in use. Overall, we can observe distinct ways in which BPM can serve as a source of innovation.

for example, meanwhile allow for real-time mining of business processes based on the digital traces that single process steps leave or based on text mining possibilities (Guâ nther, Rinderle-Ma, Reichert, Van der

Four chapters present latest findings on the role of analyzing extant data for realizing innovations in a process context.

Wil van der Aalst reports on â€oeextracting Event Data from Databases to Unleash Process Miningâ€. He introduces an approach to create event logs from underlying databases as a fundamental prerequisite for the application of process-mining techniques

when information systems do not explicitly record events. Jan Recker gives insights on â€oeevidence-Based Business Process Management:

Existing big data technology can make information available on a real-time basis and at the same time enable prediction of future events,

and processing the data in real-time is known. However the existing organization and business processes becomes a barrier for improvements.

In general the availability of data is never an issue. The usability of data on the other hand hinders the concepts from flourishing in the factories.

Interoperability is consistently an issue. Visibility in the value chain is a prerequisite for a proactive reaction.

Enterprise resource planning (ERP) systems and Data warehouses (DW) as illustrated in Fig. 3 below. The data processing, from the time an event occurs in manufacturing (for example a measurement of quality data) until management is able to make sense of the event and its consequences

requires the aggregation on information through several systems layers as illustrated below. Even though the information is available,

We want to be able to make decisions based on real time data, which can be done using, e g.,

, in-memory database technologies. To sum up and to frame the research challenges of proactive value chains:

consistent and managed applications of models Dispersed intelligence Distributed intelligence Data, information, knowledge, models and expertise are used available

while rather narrow services such as an order-approval, database request, or an ERP-based shipping receipt event entry are at the other end.

and data service to its targeted customers (primarily teens and young adults) consistent with its youthful, innovative brand.

To offer this service (sign up to ongoing voice/data provisioning) it could have created all the secondary,

relations Procurement Service fulfillment Compensation Invoicing Product data management Service provisioning Component fabrication IT service management Product design & development Shipping Corporate communications Knowledge management

removing the need to re-enter data when they return to their office, and receive immediate feedback on potential drug interactions and suggested next steps.

An industrial site inspector can input inspection data directly, triggering maintenance requests. Enterprise mobile applications can improve efficiency

Big data and Analytics Information-filled events are generated by a wide variety devices and systems: computers, mobile phones, vehicles, industrial equipment, sensors, security systems, building automation systems,

The result is a flood of data that may contain valuable information, if that information can be detected.

A variety of data-focused technologies are combined to achieve these goals, including complex event processing, pattern analysis and detection, big data processing, predictive analytics and automated decisioning.

Emerging Technologies in BPM 53 3 The Changing Nature of Work The nature of work is changing:

Using predetermined process models, historical data from the executing and past processes, and simulation techniques to project forward from 56 S. Kemsley a point in time,

runtime simulation can compare â€oewhat-ifâ€ scenarios to determine optimal preemptive actions based on the current context of the process instance and historical data for similar instances.

Absence of separation between content contributors and consumers as well as low input efforts mean lowered thresholds for contributing data and knowledge. â€¢Continuous assessment:

or risks of data loss, may prevent organizations from (successfully) implementing SM in a business process life cycle (Kemsley, 2010).

Data put online can quickly go viral. A typical case of the â€oevirulenceâ€ and unpredictability of SM is the United airlines breaks guitars video clip

or information, could be used as incriminating 62 P. Trkman and M. Klun data in court proceedings.

The monitoring phase can benefit from including SM for (1) receiving the (quantitatively measured) data and feedback from all stakeholders of the network and (2) sharing the process performance results with co-workers and customers/end-users alike.

(3) statistical analysis of SM data to provide possibilities for process improvement. 4. 1 Modeling Phase for Internal Participants The modeling of business processes provides a shared and comprehensive understanding of the business

Employees are involved actively in preparing the process model by contributing the needed data or knowledge,

Table 1 A framework for classification of SM inclusion in business process life cycle Internal participants External participants Process modeling phase Involving the employees in process modeling Gathering data

present and share data or artifacts (video, pictures or other forms of non-textual content).

All participants should have access to the monitored data and thus, in some way receive feedback about the business process they are participating in

Gathering the data required for the analysis can be time-consuming and fragmentary. Achieving a high response rate with surveys

and similar data gathering tools can be challenging. The already existing involvement of users in SM can simplify data contribution.

Including SM in the monitoring process provides stakeholders throughout the organization with a chance to contribute

4. 6 Monitoring Phase for External Participants One possibility of incorporating SM in the monitoring phase is also to make acquired data publicly accessible.

The statistical analysis of available data flows and other SM measures enable the evaluation of alternative process designs.

The findings refer to data integration and business process support as the main benefits of enterprise systems

Sensor fusion technologies include the combination of data streams from several different sensors into sought for information,

An insurer can access actual driving behaviour data through an insurance telematics program. As a result, the insurance premium can be adjusted individually Process Innovation with Disruptive Technology in Auto Insurance 87 based on driving behaviour,

semi-fixed installation using the power and data outlets, or a smartphone, as illustrated in Fig. 1. The probe monitors

through use of this data a particular driverâ€ s behaviour can be assessed. 2. 2 The Vendor Moveloâ€ s Motivation to Commercialize the Application In late 2009,

and the price calculation is done based on risk-criteria data i e. age, number of years with a driverâ€ s license

Usage grade is calculated by the application by comparing the odometer data (mileage) with the mileage recorded in the application.

filtering GPS data combined with sensor fusion from the accelerometer and gyroscope in the smartphone,

and combined with map-data in the smartphone (Haâ ndel et al.,2014). ) The complementary parameters for the risk-assessment process

The pilot generated in total big data containing 4, 500 driving hours and 250,000 km road vehicle traffic data (Haâ ndel et al.

The data quality was assured by rigorous soft computing methods. However, the results did not fulfil all the initiative goals.

road type, distance driven) The rich driving data help predict driving risks, and the loss costs for highest risk driving behaviour.

Price calculation Based on the static demographic data and historical statistics Based on the dynamic changes of driving behaviour (UBI) Customers get an accurate and personalized price.

information (data generated by the disruptive technology), business process design for the core technology implementation, product/services Process Innovation with Disruptive Technology in Auto Insurance 97 implementation, individual organization readiness for innovation implementation, towards business models and the outer

Process Innovation with Disruptive Technology in Auto Insurance 101 Part III Driving Innovation Through Advanced Process Analytics Extracting Event Data from Databases to Unleash Process Mining Wil M

This paper uses a novel perspective to conceptualize a database view on event data. Starting from a class model and corresponding object models it is shown that events correspond to the creation, deletion,

The key idea is that events leave footprints by changing the underlying database. Based on this an approach is described that scopes

binds, and classifies data to create â€oeflatâ€ event logs that can be analyzed using traditional process-mining techniques.

of event data is rapidly changing the Business Process Management (BPM) discipline (Aalst, 2013a; Aalst & Stahl, 2011;

and only organizations that intelligently use the vast amounts of data available will survive (Aalst, 2014).

Todayâ€ s main innovations are intelligently exploiting the sudden availability of event data. Out of the blue, â€oebig Dataâ€ has become a topic in board-level discussions.

The abundance of data will change many jobs across all industries. Just like computer science emerged as a new discipline from mathematics

we now see the birth of data science as a new discipline driven by the torrents of data available in our increasingly digitalized world. 1 The demand for data scientists is rapidly increasing.

all data related to social interaction. The Iop includes e-mail, facebook, twitter, forums, Linkedin, etc. â€¢The Internet of things (Iot:

refers to all data that have a spatial dimension. With the uptake of mobile devices (e g.,

It is not sufficient to just collect event data. The challenge is to exploit it for process improvements.

Process-mining techniques form the toolbox of tomorrowâ€ s process 1we use the term â€oedigitalizeâ€ to emphasize the transformational character of digitized data. 106 W. M. P. van der Aalst scientist.

Process mining connects process models and data analytics. It can be used: â€¢to automatically discover processes without any modeling (not just the control-flow,

but also other perspectives such as the data-flow, work distribution, etc.),â€¢to find bottlenecks and understand the factors causing these bottlenecks,

Despite the abundance of powerful process-mining techniques and success stories in a variety of application domains, 2 a limiting factor is the preparation of event data.

The Internet of Events (Ioe) mentioned earlier provides a wealth of data. However, these data are a not in a form that can be analyzed easily,

and need to be extracted, refined, filtered, and converted to event logs first. The starting point for process mining is an event log.

or data elements recorded with the event (e g.,, the size of an order. If a BPM system or some other process-aware information system is used,

However, in most organizations one encounters information systems built on top of database technology. The Ioe depends on a variety of databases (classical relational DBMSS or new â€oenosqlâ€ technologies.

Therefore, we provide a database view on event data and assume that events leave footprints by changing the underlying database.

Fortunately, database 2 For example, http://www. win. tue. nl/ieeetfpm/doku. php? idâ shared: process mining case stud ies lists over 20 successful case studies in industry.

Extracting Event Data from Databases to Unleash Process Mining 107 technology often provides so called â€oeredo logsâ€ that can be used to reconstruct the history of database updates.

This is what we would like to exploit systematically. Although the underlying databases are loaded with data, there are no explicit references to events, cases, and activities.

Instead, there are tables containing records and these tables are connected through key relationships. Hence, the challenge is to convert tables and records into event logs.

Obviously, this cannot be done in an automated manner. To understand why process-mining techniques need â€oeflat event logsâ€ (i e.

Therefore, we need to relate raw event data to process instances using a single well-defined view on the process.

we focus on the problem of extracting â€oeflat event logsâ€ from databases. First, we introduce process mining in a somewhat more detailed form (Sect. 2). Section 3 presents twelve guidelines for logging.

this paper aims to exploit the events hidden in existing databases. We use database-centric view on processes:

the state of a process is reflected by the database content. Hence, events are merely changes of the database.

This assumption is realistic (see e g. the redo logs of Oracle. However, how to systematically approach the problem of converting database updates into event logs?

Section 4 introduces class and object models as a basis to reason about the problem. In Sect. 5 we show that class models can be extended with a so-called event model.

The event model is used to capture changes of the underlying database. Section 6 describes a three-step approach (Scope, Bind,

and Classify) to create a collection of flat event logs. The results serve as input for conventional process-mining techniques.

Section 7 discusses related work and Sect. 8 concludes this paper. 2 Process Mining Process mining aims to discover,

or data elements recorded with the event (e g.,, the size of an order. Table 1 shows a small fragment of a larger event log.

2013b) for more information on the data possibly available in event logs. Flat event logs such as the one shown in Table 1 can be used to conduct four types of process mining (Aalst, 2011.

Extracting Event Data from Databases to Unleash Process Mining 109 The Prom framework provides an open source process-mining infrastructure.

Instead, we focus on the event data used for process mining. 3 Guidelines for Logging The focus of this paper is on the input side of process mining:

event data. Often we need to work with the event logs that happen to be available,

There can be various problems related to the structure and quality of data (Aalst, 2011; Jagadeesh Chandra Bose, Mans, & Aalst, 2013.

Before we present our database-centric approach, we introduce twelve guidelines for logging. These guidelines make no assumptions on the underlying technology used to record event data.

In this section, we use a rather loose definition of event data: events simply refer to â€oethings that happenâ€

and that they are described by references and attributes. References have a reference name and an identifier that refers to some object (person, case, ticket, machine, room, etc.)

and analyzing event data. Different stakeholders should interpret event data in the same way. GL2:

There should be structured a and managed collection of reference and variable names. Ideally, names are grouped hierarchically (like a taxonomy or ontology.

d) Celonis process mining (Celonis Gmbh)( Color figure online) Extracting Event Data from Databases to Unleash Process Mining 111 specific extensions (see for example the extension mechanism of XES (IEEE Task force

, usage of data. GL7: If possible, also store transactional information about the event (start, complete,

Event data should be as â€oerawâ€ as possible. GL11: Do not remove events and ensure provenance.

For example, do not remove a student from the database after he dropped out since this may lead to misleading analysis results.

Sensitive or private data should be removed as early as possible (i e.,, before analysis). However, if possible, one should avoid removing correlations.

We aim to exploit the hidden event data already present in databases. The content of the database can be seen as the current state of one or more processes.

Updates of the database are considered therefore as the primary events. This database-centric view on event logs is orthogonal to the above guidelines. 4 Class

and Object models Most information systems do not record events explicitly. Only process-aware information systems (e g.,, BPM/WFM systems) record event data in the format shown in Table 1. To create an event log

we often need to gather data from different data sources where events exist only implicitly.

In fact, for most process-mining projects event data need to be extracted from conventional databases. This is often done in an ad hoc manner.

Tools such as XESAME (Verbeek, Buijs, Van dongen, & Aalst, 2010) and Promimport (Guâ nther & Aalst, 2006) provide some support,

but still the event logs need to be constructed by querying the database and converting database records (row in tables) into events.

Moreover, the â€oeregular tablesâ€ in a database only provide the current state of the information system.

It may be impossible to see when a record was created or updated. Moreover deleted records are generally invisible. 3 Taking the viewpoint that the database reflects the current state of one or more processes,

we define all changes of the database to be events. Below we conceptualize this viewpoint.

Building upon standard class and object models, we define the notion of an event model. The event model relates coherent set of changes to the underlying database to events used for process mining.

Section 5 defines the notion of an event model. To formalize event models, we first introduce

In this way all intermediate states of the database can be reconstructed. Moreover, marking objects as deleted instead of completely removing them from the database is often more natural, e g.,

, concerts are not deletedâ€ they are canceled, employees are not deletedâ€ they are fired, etc. Extracting Event Data from Databases to Unleash Process Mining 113 Definition 1 (Unconstrained Class Model) Assume V to be some universe of values (strings

there cannot be two concerts on the same day in the same concert hall Fig. 2 Example of a constrained class model (Color figure online) Extracting Event Data from Databases to Unleash Process Mining 115

and class models in a database. However, it is easy to map any class model onto a set of related tables in a conventional relational database system.

but it is obvious that the conceptualization agrees with standard database technology. 5 Events and Their Effect on the Object model Examples of widely used Database management systems (DBMSS) are Oracle RDBMS (Oracle), SQL SERVER (Microsoft), DB2 (IBM), Sybase (SAP),

and Postgresql (Postgresql Global Development Group). All of these systems can store and manage the data structure described in Definition 4. Moreover,

all of these systems have facilities to record changes to the database. For example, in the Oracle RDBMS environment, redo logs comprise files in a proprietary format 116 W. M. P. van der Aalst

which log a history of all changes made to the database. Oracle Logminer, a utility provided by Oracle,

provides methods of querying logged changes made to an Oracle database. Every Microsoft SQL SERVER database has a transaction log that records all database modifications.

Sybase IQ also provides a transaction log. Such redo/transaction logs can be used to recover from a system failure.

The redo/transaction logs will grow significantly if there are frequent changes to the database. In such cases, the redo/transaction logs need to be truncated regularly.

This paper does not focus on a particular DBMS. However, we assume that through redo/transaction logs we can monitor changes to the database.

In particular, we assume that we can see when a record is inserted, updated, or deleted. Conceptually, we assume that we can see the creation of objects

and relations (denoted byï¿),), the deletion of objects and relations (denoted byï¿),), and updates of objects (denoted by ï¿).

Extracting Event Data from Databases to Unleash Process Mining 117 Definition 6 (Events) Let CM Â C;

If the customer is already in the database, the composite event cannot contain the creation of the customer object c6.

model (Color figure online) Extracting Event Data from Databases to Unleash Process Mining 119 Next we define the effect of an event occurrence, i e.,

This is denoted by OM0) L OMN. 120 W. M. P. van der Aalst The formalizations above provide operational semantics for an abstract database system that processes a sequence of events.

However, the goal is not to model a database system. Instead, we aim to relate database updates to event logs that can be used for process mining.

Subsequently, we assume that we can witness a change log L Â e1; e2;..enh i. It is easy to see atomic events.

and/or user id). Definition 3 shows that this assumption allows us to reconstruct the state of the database system after each event, i e.,

Table 1 shows the kind of input data that process-mining techniques expect. Such a conventional flat event log is a collection of events where each event has the following properties:

Dedicated process-mining formats like XES or MXML allow for the storage of such event data.

one may convert it into a conventional event by Extracting Event Data from Databases to Unleash Process Mining 121 taking tsi as timestamp and eni as activity.

1) scope the event data,(2) bind the events to process instances (i e.,, cases), and (3) classify the process instances. 6. 1 Scope:

Determine the Relevant Events The first step in converting a change log into a collection of conventional events logs is to scope the event data.

Process cubes are inspired by the well-known OLAP (Online Analytical Processing) data cubes and associated operations such as slice,

However, there are also significant differences because of the process-related nature of event data. For example, process discovery based on events is incomparable to computing the average or sum over a set of numerical values.

and classifyâ€ approach allows for the transformation of database updates into events populating process cubes that can be used for a variety of process-mining analyses. 7 Related Work The reader is referred to (Aalst, 2011) for an introduction

Next to the automated discovery of the underlying process based on raw Extracting Event Data from Databases to Unleash Process Mining 123 event data,

but about getting the event data needed for all of these techniques. We are not aware of any work systematically transforming database updates into event logs.

or Sybase IQ. However, systematic tool support seems to be missing. The binding step in our approach is related to topic of event correlation

but cannot be applied easily to selections of the Internet of Events (Ioe) where data is distributed heterogeneous

Process mining seeks the â€oeconfronta-tionâ€ between real event data and process models (automatically discovered or handmade).

and filtering the event data. The twelve guidelines for logging presented in this paper show that the input-side of process mining deserves much more attention.

database systems. This paper focused on supporting the systematic extraction of event data from database systems.

Regular tables in a database provide a view of the actual state of the information system.

For process mining, however, it is interesting to know when a record was created, updated, or deleted.

Taking the viewpoint that the database reflects the current state of one or more processes,

we define all changes of the database to be events. In this paper, we conceptualized this viewpoint.

The event model relates changes to the underlying database to events used for process mining.

A logical next step is to develop tool support for specific database management systems. Moreover, we would like to relate this to our work on process cubes (Aalst, 2013b) for comparative process mining.

Slicing, dicing, rolling up and drilling down event data for process mining. In M. Song, M. Wynn,

Data scientist: The engineer of the future. In K. Mertins, F. Benaben, R. Poler, & J. Bourrieres (Eds.),

Replaying history on process models for conformance checking and performance analysis. WIRES Data mining and Knowledge Discovery, 2 (2), 182â€ 192.

Extracting Event Data from Databases to Unleash Process Mining 125 Aalst, W. van der, Barthelmess, P.,Ellis, C,

IEEE Transactions on Knowledge and Data Engineering, 16 (9), 1128â€ 1142. ACSI. 2013). ) Artifact-centric service interoperation (ACSI) project home page.

In Sixth International Conference on Extending Database Technology (Lecture Notes in Computer science, Vol. 1377, pp. 469â€ 483.

Data mining and Knowledge Discovery, 14 (2), 245â€ 304. Barros, A.,Decker, G.,Dumas, M, . & Weber, F. 2007).

Towards a unified view of data. ACM Transactions on Database Systems, 1, 9â€ 36. Cohn, D,

. & Hull, R. 2009). Business artifacts: A data-centric approach to modeling business operations and processes.

IEEE Data Engineering Bulletin, 32 (3), 3â€ 9. Cook, J, . & Wolf, A. 1998). Discovering models of software processes from event-based data.

ACM Transactions on Software engineering and Methodology, 7 (3), 215â€ 249. Cook, J, . & Wolf, A. 1999).

Software process validation: Quantitatively measuring the correspondence of a process to a model. ACM Transactions on Software engineering and Methodology, 8 (2), 147â€ 176.

Distributed and Parallel Databases, 25 (3), 193â€ 240. Goedertier, S.,Martens, D.,Vanthienen, J, . & Baesens, B. 2009).

Itâ€ s high time we consider data quality issues seriously. In B. Hammer, Z. Zhou, L. Wang,

IEEE Symposium on Computational Intelligence and Data mining (CIDM 2013)( pp. 127â€ 134. Singapore: IEEE. Montahari-Nezhad, H.,Saint-paul, R.,Casati, F,

IEEE Symposium on Computational Intelligence and Data mining (CIDM 2011)( pp. 184â€ 191. Paris: IEEE. OMG.

Extracting Event Data from Databases to Unleash Process Mining 127 Reichert, M, . & Weber, B. 2012).

IEEE Symposium on Computational Intelligence and Data mining (CIDM 2011)( pp. 148â€ 155. Paris: IEEE. Weijters, A,

Rediscovering workflow models from event-based data using little thumb. Integrated Computer-Aided Engineering, 10 (2), 151â€ 162.

Moving to reliable, valid and ultimately credible decisions about innovations through evidence-based decision-making requires an ability to work with data

available and quality data that can be used as evidence. They also require a capability to collect

analyze and interpret such data to prepare for decisions. Table 2 summarizes relevant requirements. These scientific capabilities can obviously be provided by universities and research institutions.

and interpret data using rigorous scientific methods, research can provide additional innovation support services: â€¢Novel conceptual perspectives:

Evidence that is converted from data gathered and analyzed scientifically can provide a solid and trustworthy platform for decision-making about innovations, their potential, pitfalls and consequences. â€¢Increased research bandwidth:

Table 2 Requirements for evidence-based innovation decisions Capability Requirements Data awareness Identifying appropriate data Finding available data Understanding the quality of Data science

and gather objective data that can be used as facts in innovation decisions. I use the term digital infrastructure in a deliberately loose manner,

or complement historical transaction data with real-time data and analytics, such as in-memory technology. These digital infrastructures provide ample opportunities for evidence-based management in process innovation.

that is, while more data can be generated, more can also be analyzed and used. A classical example is that of Google analytics that offers free analysis of web browsing behavior, ready at the fingertips of any decision-maker.

and analyze large volumes of data in real-time (vom Brocke, Debortoli, Muâ ller, & Reuter, 2014).

Traditionally, fact finding in support of decision-makingâ€ in the context of BPM methodologies such as Six Sigma and othersâ€ has always been hampered by sheer pragmatic concerns about the feasibility, resourcing and costing of data collection efforts.

Data that is generated on digital platforms is located typically at the other end of the scale:

Data points are generated well beyond the sample size required to reach conclusive findings about the data.

It is no longer acceptable not to peruse available data and evidence in making process-related decisions.

this is a research challenge where data such as store size, quality of baking, number of competitors in the market, customer demographics,

Having examined these factors by studying technology data (such as point-of-sales, HR and payroll systems, census data about customer demographics) as well as empirical data from studying the stores and process participants themselves,

conclusions can be made about the occurrence of positive deviance. In a nutshell, in our example the findings were as follows:

who use data from an information system, together with their detailed knowledge of local customers, local events and all other factors that will influence sales.

Combining evidence from past sales data, forecasting algorithms as well as observations and evidence from how store managers operate,

volume Low volume System MAPE System MAE Trend line MAPE Trend line MAE M AE Fig. 4 Data analytics in the replenishment process 2 In replenishment,

data scientists are becoming an essential resource in developing a capability to identify, understand, analyze and interpret evidence in support of innovation decisions about business processes.

Data scientist: The sexiest job of the 21st century. Harvard Business Review, 90 (10), 70â€ 76.

what data to gather and when and how to make decisions. A number of approaches for flexible process management have emerged in recent years.

and to collect different subsets of data at different points in the process, with as few restrictions as possible.

In a second step, data is collected with respect to the chosen performance measures. This is followed (third and fourth steps) by identifying samples of exceptional performance from the data

and analyzing the data in an exploratory manner in order to identify what factors might underpin the identified exceptional performance (positive deviance).

In a fifth step, statistical tests are used to identify correlations and causal links between the identified factors and positive deviance.

what input data values to provide, so that the likelihood of violation of business constraints is minimized. 150 M. Dumas and F. M. Maggi In this paradigm,

and (2) the values of data attributes after each activity execution in a case. As an example, consider a doctor who needs to choose the most appropriate therapy for a patient.

Historical data referring to patients with similar characteristics can be used to predict what therapy will be the most effective one

and for every data input that can be given to this activity, the probability that the execution of the activity with the corresponding data input will lead to the fulfillment of the business goal.

To this aim, we apply a combination of simple string matching techniques with decision tree learning. An approach for the prediction of abnormal terminations of business processes has been presented by Kang

and Pontieri (2012) propose a predictive clustering approach in which context-related execution scenarios are discovered

In Proceedings of the IEEE symposium on computational intelligence and data mining (CIDM)( pp. 111â€ 118.

In Proceedings of the workshop on databases in networked information systems (DNIS)( pp. 1â€ 14. Springer.

In Proceedings of the international conference on extending database technology (EDBT)( pp. 21â€ 32. Springer.

In Proceedings of the SIAM international conference on data mining (SDM)( pp. 644â€ 655. SIAM. 154 M. Dumas and F. M. Maggi Identification of Business Process Models in a Digital World Peter Loos, Peter Fettke, Juâ rgen Walter, Tom Thaler,

â€¢Clustering: In a clustering step the different individual models are grouped in a way such that models within one group are similar

and models belonging to different groups are different. Here, typical techniques of cluster analysis or multivariate statistics can be used.

The modelsynset created in phase 3 can support the grouping. Known similarity measures for enterprise models can also be applied (Dijkman et al.

and can comprise database schemata (e g. Evermann, 2009) as well as arbitrary other model schemata. Process matching can be divided into two different fieldsâ€ matching process models (1) and matching nodes of process models (2)( Thaler, Hake, Fettke, & Loos, 2014.

which decides on the similarity being 0 or sim (L1, L2). In the end, the Refmod-Miner/NSCM technique extracts binary matchings from the calculated node clusters.

an example is shown in Fig. 3. Three sample EPCS in a model variant collection represent the input data.

conversion and transformation as well as versioning of model data are available. Generally, two file formats are supported:

In a first step, clustering techniques are used to identify and reconstruct the given model groups. Since the model repository consists of 80 single models with 8 different processes and 10 variants each,

(3) creating a homogeneous data basis for different application and analysis scenarios. Moreover, the authors aim at publishing the model corpus in terms of open models,

/It contains 98 reference model entries with lexical data and meta-data, such as the number of contained single models.

The Very Large Database Journal, 10, 334â€ 350. Rahm, E, . & Bernstein, P. A. 2001b).

International Journal on Very Large Data base, 10 (4), 334â€ 350. Rehse, J.-R.,Fettke, P,

Data and Knowledge Engineering, 68 (9), 793â€ 818. doi: 10.1016/j. datak. 2009.02.015. Vogelaar, J. J. C. L.,Verbeek, H. M. W.,Luka, B,

data view or organizational view) are modeled, it is important to maintain a systematic relationship between modeling elements from different views to ensure all models are integrated properly.

Data management (2nd ed.).Norwood, MA: Artech House. Sweller, J. 1988. Cognitive load during problem solving:

The empirical data was derived from a series of workshops and interviews with the key stakeholders along the process steps

Due to an increased degree of digital connectedness and increased flow of data from assets and actions within the ecosystem, there are great possibilities to ensure that the value production becomes even more coordinated

and collaboration between engaged actors through its ability to enable actors to share desired data.

and energy efficient flows (as means for KPA#3). Each of the (digital) innovations initiated in the future Airport project had the goal to contribute to these values (see Fig. 4 below) founded on patterns of data streams as the common information environment.

and performance metrics, allowing correct measurement data to be obtained and for the results to be interpreted based on relevant contextual factors (explanatory factors),

what data should be replicated in a management dashboard. 3. 8 Innovation 2: Information Sharing Platforms for Situational Awareness:

The ambition with a management dashboard is to enable digital images providing status of the D2d process for key stakeholders with relevant data in real time for the purpose of increased punctuality and customer satisfaction.

and turn-around 208 M. Lind and S. Haraldson core of this innovation is a tool for providing digital images (see Fig. 6 above) based on information from different key actors,

and small third party developers design the latest traveller support services using commonly available data. There are a number of novel insights to be made.

Second, consumers of digital services (e g. the travellers) are also suppliers of feedback data, encompassing feedback on digital services, new ideas on digital services, the use of physical infrastructures and transport

which data should be provided. Fig. 7 Example of a passenger dashboard channelized via different media 210 M. Lind

Extracting event data from databases to unleash process mining. In J. Brocke & T. Schmiedel (Eds.

It is also responsible for obtaining data about the performance indicators, their historical data and current values,

and that uses business intelligence mechanisms to extract data from performance indicators. Both regular business processes and adapters are modeled using BPMN notation

All variables from the process instance are copied to the adapters so all adapters can have access to the process data.

On the basis of this data, the adapters can compute recommendations and store them in the Context Provider.

, cyber physical systems, big data analysis, business intelligence approaches, or process mining provide more and more results in real-time.

, data elements, values, organizational units, or temporal characteristics. Finally, similar to classical â€oeby designâ€ Flexible Workflow Management System (WFMS) Adapted Workflow Instance D es ig n-tim e R un

Current technological progress and the ongoing trends to analyze business data quasi in real-time will,

Data Knowledge Engineering, 53, 129â€ 162. Betke, H.,Kittel, K, . & Sackmann, S. 2013). Modeling controls for compliance â€ An analysis of business process modeling languages.

Data and Knowledge Engineering, 50, 9â€ 34. Sackmann, S. 2011. Economics of controls. In Proceedings of the international workshop on information systems for social innovation 2011 (ISSI 2011)( pp. 230â€ 236.

â€¢Zachmanâ€ s enterprise architecture framework (1987) categorizes different artifacts of organizational data that are required for IT development, e g. design documents, specifications, and models.

1) what (data),(2) how (function or process),(3) where (network),(4) who (people),(5) when (time),

or knowledge sharing databases. 4. 1 Learnings The process capability framework and the underlying maturity models illustrate that BPM can be approached from a technical perspective

Database Marketing and Customer Strategy Management, 18 (1), 31â€ 38. Basu, S. C, . & Palvia, P. C. 2000).

Database Marketing and Customer Strategy Management, 18 (1), 50â€ 64. Zachman, J. A. 1987. A framework for information systems architecture.

Furthermore, Process Responsibility takes over ownership of processes, master data, and customized system settings. This includes the definition of process trainings and participation in the appointments of process management roles.

Due to the clustering of process instances this person can work fulltime as a Process Manager and the profession-alism (i e.,

Master data management. Burlington: Morgan Kaufmann. Markus, M. L, . & Jacobson, D d. 2010). Business process governance.

a collaborative research center that gathers ten Estonian IT organizations with the aim of conducting industry-driven research in service engineering and data mining. From 2000 to 2007,

He has published more than 200 research papers, a o. in ACM Transactions on Software engineering and Methodology, IEEE Transaction on Software engineering, Information systems, Data & Knowledge Engineering,

100 Automatic layout, 183 B Big data, 3, 7, 10,22, 53,95, 106,250 Bottom up approach, 61 BPM.

Data awareness, 135 Database, 13,24, 32, 105â€ 125,163, 165,271 Database management systems (DBMS), 107,108, 116,117, 119, 123â€ 125 Data science, 106,135 DBMS.

See Database management systems (DBMS) Deployment models, 79 Design principle, 13,78, 98,135, 146,178, 179,182, 183,221 Deviance mining, 11,13, 146â€ 153 Digital age, 4

, 257 Empirical evidence, 155 Enterprise software, 52,54 Enterprise system, 13,18, 21, 75â€ 84,139, 297 Event data, 13, 105â€ 125 Event log

/modeling, 13,60, 63â€ 66,78, 108, 177â€ 189,225, 263â€ 268, 270â€ 272,287 clustering, 161â€ 162,170 collection, 170,216 corpus, 171 harmonization, 164

and Distributed Co-creation 2. 3 Events, Big data and Analytics 3 The Changing Nature of Work 3. 1 Social BPM 3. 2 Dynamic Processes

Driving Innovation Through Advanced Process Analytics Extracting Event Data from Databases to Unleash Process Mining 1 Introduction 2 Process Mining 3 Guidelines for Logging 4 Class