Диссертация (1137066), страница 6
Текст из файла (страница 6)
. u δ(gk ) u δ(g)) contains the object of the opposite class. In order tocontrol this issue, we add third hyperparameter which is α-threshold. If the percentage of objects from the set of positive (negative) examples that falsify the premiseδ(g1 ) u . . . u δ(gk ) u δ(g) is greater than alpha-threshold share of this set of examplessize then the premise will be considered as falsified, otherwise the premise will beα-weak and, thereafter, used in classification of the test object.The idea behind the algorithm is to check whether it is set of positive or set ofnegative examples that the test object is more similar to. The similarity is defined asa total support of α-weak positive (negative) premises that contain the description oftest object.
The support of an α-weak positive description of d+ is called |d+ ∩ G+ |,that is, the number of objects from the set of positive examples G+ satisfying thedescription of d+ . The support of an α-weak negative description of d− is called23|d− ∩ G− |, that is, number of objects from the set of negative examples G− satisfyingthe description of d− . Let there be p α-weak positive descriptions and n α-weaknegative descriptions, all of them contain the description of the test object δ(gtest ),i.e. ∀i = 1, . . . , p : d+i v δ(gtest )and∀j = 1, . . . , n : d−j v δ(gtest ).
The totalPpsupport for α-weak positive descriptions is P =i=1 |d+i ∩ G+ |, and the totalPsupport for α-weak negative descriptions is N = nj=1 |d−j ∩ G− |. Based on thevalue 4 = P − N , an estimation is made whether the test object is more similar toobjects from a set of positive or negative examples; it serves as credit score for theborrower’s creditworthiness assessment. The algorithm uses three hyperparameters:subsample size, number of iterations and alpha-threshold.The first hyperparameter is expressed as percentage of the observations in theset of positive (negative) examples. At each step the subsample is extracted and thedescriptions of the objects in subsample are intersected with the description of testobject.The number of times (i.e. number of iterations) we randomly extract a subsample from the set of positive (negative) examples is the second hyperparameter of thealgorithm, which is also tuned through grid search.
If the percentage of objects fromthe set of positive (negative) examples that falsify the premise δ(g1 )u. . .uδ(gk )uδ(g)is greater than α-threshold share of the set of examples then the premise will be considered as falsified, otherwise the premise will be α-weak and, thereafter, used inclassification of the test object.These steps are performed for each test object for the sets of positive and negative examples separately, producing a set of positive and negative α-weak premises.The final output for the test object we used was a difference between the total numberof objects from the set of positive examples supporting positive premises and the totalnumber of objects from the set of negative examples supporting negative premises.It has to be mentioned that similar idea of fuzzy threshold was introduced in[79].
The paper expands the JSM-method in case of fuzzy hypotheses. The authorintroduces a drop which is a combination of < f, n, m >, and to some extent servesas an α-weak description. The first element of a drop is f - a fragment - in termsof Grigoriev, that is an analog of what states for description in this dissertation. The24constants n and m define correspondingly the minimum required number of objectsfrom the set of positive examples that must satisfy the f -fragment, and the maximum allowed number of objects from the set of negative examples that can satisfythe f -fragment.
If fragment is falsified "rarely" (less than m counter examples fromopposite examples), then the f -fragment is told to be a "cause" for particular propertyof an object. Alpha-threshold regulates for the maximum acceptable percentage ofobjects in opposite examples, which can satisfy the description.
To continue with thisanalogy, α-weak description is considered to be a "cause" of the target property (default or non-default client). Also, [79] introduces Boolean function "suit", which issimilar to aggregating operator, which provides the voting over α-weak descriptions.To bring contrast between this research and the one in [79] it has to be mentioned that previous work does not provide any randomized approach, and thereforethe work is far more theoretical as soon as calculating drop is dramatically costlyfor memory and CPU. Therefore, in previous work they resort to assumption that thenumber of possible fragment is low. In our case, relatively large samples of data areused and the feature space dimension is also high, so, the previous assumption doesnot hold true.
Grigoriev uses traditional JSM-type reasoning where all hypothesesare generated at the first stage, whereas in our work only query-based hypotheses aregenerated, which reduces memory used and classification time for a single classification. Also, there is difference in attributes space: Grigoriev operates exclusivelywith binary attributes, while in our case classification problem is considered withininterval descriptions (numerical data) as soon as this is the typical case for credit riskassessment data.
In the next section further steps of proposed query-based classification algorithms are discussed.3.4Voting SchemesIn order to produce classification for a test object, we construct a voting schemeamong α-weak premises. In most general case voting scheme F is a mapping:+ −−F (gtest , h+1 , ..., hp , h1 , ..., hn ) → {−1, 1, ∅}25where gtest is the test object with unknown class, h+i is a positive α-weak premise∀i = 1, p and h−j is a negative α-weak premise ∀j = 1, n , -1 is a label for negativeclass, and 1 is a label for positive class (i.e.
defaulters). In other words, F is anaggregating rule that takes premises as input and gives the classification label as anoutput. Note, that we allow for an empty label. If the label is empty it is said thatthe voting rule abstains from classification. There may be different approaches tobuild up aggregating rules. The voting scheme is built upon weighting function ω(·),aggregation operator A(·) and comparing operator ⊗.F (ω(·), A(·), ⊗) =n−= (Api=1 [ω(h+i )]) ⊗ (Aj=1 [ω(hj )])In order to configure a new weighting scheme it is sufficient to define the operators and weighting function.
In this paper we use the number of positive versusnegative α-weak premises. In this case the rule allows the test object to satisfy bothpositive and negative premises which decreases the rejects from classification. Theweighting function, aggregation operator and comparing operator are defined as follows:A(h) =Xh1, if δ(gtest ) v hω(h) =0, otherwisesign(b − a), if a 6= ba⊗b=∅,a=bSo the label for a test object gn is defined by following mapping:+ −−F (gtest , h+1 , ..., hp , h1 , ..., hn ) =pnXX+= ( [δ(gtest ) v hi ]) ⊗ ( [δ(gtest ) v h−j ])i=1j=1However, one can think of margin b − a as a measure for discrimination betweentwo classes and consider the decision boundary based on ROC analysis, for instance.26This approach is good for decreasing the number of rejects from classification, butit does not account for the support of the premises.
Naturally, one would give moreweight to the premise with large image (with higher support). Also, if the number ofpositive and negative premises is equal the rule rejects from classification. Secondly,voting schemes can be varied in order to better account for evidence from data. Toelaborate on possible variations we will consider total number of objects within positive versus the total number of objects within negative premises can be considered.Then voting rule takes the following form:+ −−F (gtest , h+1 , ..., hp , h1 , ..., hn ) =pnXX + − (h ) ) ⊗ ((h ) )=(iji=1j=1Here, only the weighting function is changed in comparison to the previous scheme:ω(h) = |h |This approach accounts for the premise image power, but it can count the sameobjects that support the positive (negative) class several times as soon as these objectsappear in the images of several premises.Thirdly, the total number of unique objects within positive versus the total number of unique objects within negative premises can be considered.
In this case aggregation operator takes some features from the weighting function and the latter turnsinto a mapping:+ −−F (gtest , h+1 , ..., hp , h1 , ..., hn ) = pn [[+ − = unique[ (hi ) ] ⊗ unique[ (hj ) ] i=1j=1The A(·) operator is redefined over the images of the premises and the weightingfunction is redefined as Galois mapping.A(h ) = |unique[h ]|ω(h) = h27Further, the geometrical interpretation the feature space can be used. The premise(given the numerical data) represents, in effect, a multidimensional parallelepiped.The edges of this parallelepiped are intervals for the particular attribute. The moregeneric is the premise the wider are the intervals of the premise, thus, the greater isthe geometrical volume of the premise.
We are eager to favor and give more weightto premises with less volume as soon as they are more specific for the test object.Here we can find analogy with k-nearest neighbors when the test object is classifiedaccording its closest in some sense neighbors. The closer is the neighbor, the moreweight it has for the predicted class.+ −−F (gtest , h+1 , ..., hp , h1 , ..., hn ) =p YMMn YXX+h−h−h+hjii(bk − ak j ))=((bk − ak )) ⊗ (j=1 k=1i=1 k=1h+h+where M is the number of attributes, ak i , bk i are the left and the right border of thepremise interval correspondingly. Note, that the comparing operator is redefined andnegated in contrast to the previous cases:XA(h) =hω(h) =MY(bhk − ahk )k=1a⊗b=−sign(b − a),if a 6= b∅,otherwiseWeighting function is the geometrical volume of the premise. The comparing operator is negated because now the premises with greater volume are less importantin voting.