Диссертация (1137084), страница 8
Текст из файла (страница 8)
The algorithm discovers basic workflow patterns using special procedure on a33directly-follows graph constructed from event log. S. Leemans et al. [23] have also proposedimprovements for this method, which one can use to deal with incomplete event logs. Figure 1.11shows a Petri net discovered using Inductive miner from the event log (1.1).Figure 1.11: Model discovered using Inductive minerNote that inductive miner guarantees the soundness of a discovered block-structured processmodel [23]. The following model discovery algorithms also guarantee soundness, but output processmodels in other formalisms: Evolutionary tree miner [67], Maximal pattern miner [68], Constructscompetition miner [69,70].
Informally, a process model is sound iff it has (1) the option to complete,(2) a proper completion, (3) no dead transitions [71]. Sound process models are valuable becausethey can be properly executed. Note that soundness of process models is out of this thesis’ scope.What is more important in the context of this thesis, the inductive algorithm with certainparameters (in particular, the setting inductive miner incompleteness can be used) guarantees aperfect fitness of the discovered model.A wide variety of process model discovery techniques are present in the field. The geneticdiscovery approach has been described in a series of papers [43, 72, 73] by A.
de Medeiros et al.The ++ -algorithm [74], that aims at discovery of non-free-choice workflow patterns, and otherextensions of basic -algorithm have been proposed. Coloured Petri nets can be discovered aswell [75]. A. Kalenkova et al. [46, 76–78] proposed methods to discover BPMN 2.0 models. It isalso possible to discover declarative process models [79, 80] which are suitable when a constraintform process description is needed instead of imperative model. Hybrid process models uniteimperative and declarative modelling features. F. Maggi et al. [81] proposed a method for discoveryof hybrid models.
In recent times, the discovery of local process models has been developed actively[48,49,82,83]. These methods aim at discovery of models for locally structured fragments of processbehaviour in complex unstructured business processes. Finally, A. Augusto el al. [50] prepared acomprehensive review of process discovery approaches, to that we refer curious readers.Figure 1.12: Model visualized using Inductive visual minerNote also that many process discovery tools are visual and interactive. They provide a user withrich visualisations, and support fine-tuning of a discovered model. See, for example, how Inductive34visual miner constructs a model in Figure 1.12.
Activities and flows are coloured according to theirusage frequency. Currently replayed events are shown as yellow dots moving across the model.In this thesis we propose a process repair approach that uses process discovery whensynthesizing a model to replace a non-conforming model fragment. It is shown in Section 2.2 thata process discovery algorithm needs to guarantee perfect fitness of a discovered model to the eventlog. The definition of fitness, which is the most important criteria of a log-model conformance,will be explained in Section 1.2.2.
Careful investigation of existing process discovery techniquesrevealed that there are two well-elaborated techniques which provide a user with perfectly fittingPetri net models, if used with relevant parameters: Inductive [47] and ILP [24] algorithms. Thus,these mining algorithms are used in this thesis.1.2.2Conformance CheckingConformance checking techniques provide a user with evaluation of process model conformanceto a given event log w.r.t.
specified conformance criteria. Figure 1.13 shows the scheme of aconformance checking algorithm.Event LogConformanceCheckingConformanceIndicationModelFigure 1.13: Conformance checkingThe following four criteria are usually used to evaluate model quality: fitness, precision,generalization, and simplicity [5]. Usually, fitness and precision are used for checking theconformance between a model and an event log.Fitness shows to what extent the log traces can be replayed by the model. Let be a workflownet with transition labels from .
Its initial marking is (︀⌋︀ and final marking is (︀⌋︀. Let be a traceof . We say that a model perfectly fits a trace = ∐︀1 , . . . , ̃︀, if there is a sequence of transition1firings (︀⌋︀ = 0 → . . . → +1 = (︀⌋︀ such that the sequence of activities ∐︀(1 ),(2 ), . .
. , ( )̃︀coincides with after the deletion of all silent transitions. An event log perfectly fits (or perfectly fits ), if each trace from perfectly fits . For example, the event log ′ = (︀∐︀, , , , ̃︀ ,25∐︀, , , , ℎ, , , , ̃︀, ∐︀, , , , ̃︀ ⌋︀ perfectly fits the model in Figure 1.2.Precision determines to what extent the model can generate behaviour that is not recorded inthe log. A model is perfectly precise if it describes exactly the same behaviour as event log. These35two quality criteria are important in the context of this thesis. Thus, fitness and precision will bethoroughly considered in the remainder of this section.Generalization and simplicity are considered less important.Generalization shows the level ofmodel abstraction.
A model should somehow generalize the behaviour from an event log to beuseful. Otherwise, a model is called overfitted, that is too specific. Such a model is just anotherrepresentation of an event log. Nevertheless, existing process model repair techniques do not affectmodel generalization noticeably.Simplicity is also important, the model should not become too complicated after its repair.Bluntly, simplicity shows model’s compactness. Using this metric one may try to evaluate if amodel is easy-to-read by an arbitrary expert.
It can be defined in various ways.Note that simplicity touches both structural and behavioural aspects of a model. The modelis simple if it has a minimal possible number of modelling elements which are needed to replayall process behaviour. Besides, a simplicity measure can take the complexity of used control- anddata-flow constructs into account.In general, it is meaningless to compare unfitting models according to their simplicity, becausethe are not truly represent corresponding event log. Moreover, it is shown that one may alwaysconstruct a fitting and simple model for an arbitrary event log which is not very useful.
Theseare so-called flower models [5]. A model of that type can replay an event log, simple, but veryimprecise. For example, Figure 1.14 shows a workflow net that can replay all traces which beginwith activity a, end with activity f, and have any possible combination of activities b, c, d, ebetween them.t2t3bcaft1t6det4t5Figure 1.14: Flower workflow netMany techniques to evaluate simplicity (or complexity) of process models have been proposedin literature, which consider different aspects of modelling [84–92].
Ideally, a repaired model shouldbe as simple as the initial one. However, the most important for repair is to construct a perfectlyfitting model. Thus, in some cases repaired models are more complex. Petri net simplicity canbe measured as the number of modelling elements (places and transitions). We compare thesenumbers in repaired and corresponding initial models (see Section 4.3).36A large number of conformance checking techniques are presented in the literature [22,93–101].These techniques deal with models in different notations and are able to calculate various qualitymetrics. Most of fitness and precision calculation methods are based either on model replay, or onso-called alignments.
In the remainder of this section, both approaches are considered in detail.Conformance Checking using ReplayA. Rozinat and W. van der Aalst have proposed this method of conformance checking [102,103].When calculating the fitness using this method, we try to replay each trace from the event log, i.e.we try to reproduce each event from the trace by the firing a model transition with a correspondinglabel. If sets of transition labels and event names coincide, the idea of algorithm is as follows.Replay starts at the initial marking (the single token in the source place of a workflow net).The algorithm sequentially passes through each trace in the log. Two cases are possible for eachevent in a trace: either an enabled transition exists with a suitable label, which is equal to theevent’s name, or there is no such a transition.
In the first case, a suitable transition fires, netmarking changes according to the firing rule, the algorithm moves to the next event in the trace.In the second case, there is a transition with suitable label in the model, but it is not enabled,i.e. there are not enough tokens in its input places.
Then, the required number of artificial tokensis added to its input places. This allows for further execution. Herewith, the algorithm marksplace with corresponding number of missing tokens. More missing tokens will be added to thisplace, if the case repeats for other traces. The algorithm is executed until the end of a trace orthe final marking of the net is reached. Tokens, which are not consumed during replay and are inthe non-final places, are called remaining tokens. Such an algorithm executes for each traces inthe log.Finally, A. Rozinat [103] defines the fitness between a Petri net and an event log as follows: (, ) =11∑ ∑ (1 − ∈) + (1 − ∈ ) ,22∑∈ ∑∈ (1.2)where is a number of missing tokens, is a number of remaining tokens, is a number ofconsumed tokens, is a number of produced tokens, and is a trace from the event log.
Thus,the fitness is low in cases, when the number or missing and remaining tokens is large.Alignment-based Conformance CheckingFitness. In this thesis, conformance checking techniques based on alignments are used. Analignment is a special correspondence between a trace and a sequence of transition firings in aPetri net [22]. Figure 1.15 shows two examples of an alignment.