Диссертация (1137084), страница 18
Текст из файла (страница 18)
The potential problem is to find (or construct) an event log with particularcharacteristics. Artificially generated logs are needed for better testing and evaluation of theprocess mining algorithms.Developments in this area help researchers not only to verify concepts of algorithms but alsoto improve them based on model behaviour. When we provide a researcher an opportunity tomanipulate a big number of behavioural examples of a model, it leads to higher quality of productsbeing developed.The possible evaluation approach can be defined as follows.
Firstly, one generates an event logby simulating the carefully selected process model with characteristics suitable to test concretealgorithms. Secondly, a novel process discovery algorithm is applied to this log. Finally, thediscovered model is compared to the initial one.In this chapter, we also describe a new process simulator tool that can generate an artificialevent log with defined properties. Meanwhile, a described solution for process model simulationmay be also used for other purposes.This chapter begins with Section 3.1 describing the process simulation techniques which havebeen proposed in process mining up-to-date.3.1Event Log Generation Techniques: Related WorkIn this Section, we focus on existing approaches and tools which are able to generate artificialevent logs using different types of models in the context of process mining.
Their main features,strengths and weaknesses are considered.Manual Event Log GenerationManual generation of logs with particular characteristics is the most naive way to test processmining algorithms. However, in some cases it is useful. For example, during implementation ofa discovery algorithm one usually has several very simple samples to evaluate the code in verystraightforward situations. Manual generation has evident limitations and disadvantages. Creatingseveral larger sets of logs through manual generation is extremely tedious and possibly leads tomany of mistakes. Usually, it is also a very time-consuming activity even if a researcher has enoughexperience.CPN ToolsAn approach for the generation of artificial event logs using CPN Tools has been proposedby A.
de Medeiros and C. Günther [146]. CPN Tools is a widely used visual editor for colouredPetri nets with powerful simulation and analysis abilities. The proposed extension for CPN Tools77provides an opportunity to generate random logs based on a given Petri net. It produces the resultlog in MXML considering that the log will be used by ProM Framework [26]. The main difficultyof the approach is that it implies writing scripts in the Standard ML language, which leads topossible problems during tool adaptation for a specific task. At the same time, the tool has a lotof applications in the field of coloured Petri nets analysis and simulation.CPN Tools does not support the simulation of BPMN models directly. Although manualapproaches for transformation of BPMN subsets to CPN were discussed by M.
Zäuram [147]and M. Ramadan et al. [148], there are no well-defined and implemented algorithms to performthese transformations. Thus, a lot of manual work is needed to simulate BPMN models, whichincludes control-, data-flow, messages, and other BPMN-specific concepts using the coloured Petrinets notation.DEVS Models SimulationYet another way to simulate BPMN models is to follow two-step approach: (1) transformBPMN model to a modeling formalism called DEVS (Discrete Event System Specification [149]),(2) then simulate DEVS model using one of the DEVS simulation tools1 . Different implementationsof such an approach are considered by D. Cetinkaya et al.
[150] and H. Bazoun et al. [151]. Inthese papers, the authors present two software tools for BPMN-to-DEVS transformation. AnotherBPMN-to-DEVS transformation is presented by S. Boukelkoul and R. Maamri [152].In comparison to both these approaches, our method works without an additionaltransformation step. We state that BPMN models can be executed directly. Moreover, we presenta corresponding semantics for BPMN models and a plug-in for ProM Framework [26], whichimplements the proposed approach. Since ProM provides integration of various process discoveryand log processing plug-ins, the direct simulation based on the formal BPMN semantics canbe incorporated with other process mining approaches to automate testing of BPMN discoverymethods.
Special scaffolds need to be constructed for generating event log in a suitable formatwhen using both two-step approaches. There is need of a substantial amount of additional work.One have to prepare scripts for DEVS atomic components representing BPMN gateways and tasks.Data objects are also hard to deal with.Processes Log GeneratorProcesses Logs Generator 2 (PLG, and its later version PLG2) is a highly configurable toolboxto produce event data by A. Burattin [153, 154]. PLG is a framework for the generation ofartificial business process models in a form of dependency graphs and event logs of their execution.This is a plug-in for ProM framework [26] which enables to create random BPMN models fromcommon workflow patterns and to execute these models.
PLG implements models customization12The list of DEVS tools by G. Wainer: http://www.sce.carleton.ca/faculty/wainer/standard/tools.htmPLG official page: http://plg.processmining.it/78by changing basic pattern percentages: loop percentage, single activity percentage, sequencepercentage, AND split-join percentage, XOR split-join percentage. Furthermore, it gives usersan opportunity to select distribution from Standard Normal, Beta and Uniform which is usedto choose between random methods designated to decide which activity will be used. Noise logrecords can be generated throughout simulation of execution and it is possible to choose noiselevel.PLG2 [154] adds a data-flow perspective and claims an ability to generate “potentially infinite”streams of events from a process model, which allows to test on-the-fly process mining techniques.In both versions a recursive composition of basic workflow patterns [155] is used to generaterandom models.
The author employs context free grammars to build a structured trees forming acontrol-flow of models. User can specify a desired distribution for a number of recursive productionsand forks on each split/join. Generated process model can be executed with registration of eachfired activity as event in a log. A similar technique for generating structured BPMN models andevent logs is used in [156].This tool is very useful for big scale brute force testing of an algorithm. The plug-in generatesa set of models and an execution log for each model. Unfortunately, a user can not use existingmodel for logs generation. Thus, one can not make a fine adjustment of an experiment. Moreover,these approaches deal only with block-structured process models.SecSy ToolYet another instrument for event logs generation is SecSy tool [157].
The purpose of this tool isto generate artificial event logs for testing algorithms in the field of security-oriented informationsystems modelling. SecSy has been developed in a form of a standalone application allowingflexible settings of process models and their executions. The tool generates an event log satisfyingthe behaviour predefined by a given specifications. It can create sets of logs per one run and addsome deviations from the original model. The results can be produced in both MXML and XESformats.This tool is made to run experiments with security-oriented information systems.
It allows togenerate special event logs with particular parameters useful for security analysis of processes.Unfortunately, this orientation imposes restrictions on models which can be used by tool.BPMN EnginesEvent logs generation is also possible using the well-known BPMN engines (Activiti3 , Bizagi4 ,and others). These frameworks are developed to maintain real-life business processes using Javaor other technologies, and can be extended with log generation functions. Unfortunately, these34Activiti BPM Platform: http://activiti.org/Bizagi Engine: http://www.bizagi.com/en/79engines usually have rather complex architecture. Thus, it is hard to apply them in researchsetting.Moreover, engines usually do not follow the BPMN standard strictly. For example, thesynchronization of parallel branches in Activiti (Camunda) differs from the specification.