978-0-262-31709-2-ch014 (Rice), страница 3
Описание файла
Файл "978-0-262-31709-2-ch014" внутри архива находится в следующих папках: Rice, Другое. PDF-файл из архива "Rice", который расположен в категории "". Всё это находится в предмете "конструирование компиляторов" из 7 семестр, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .
Просмотр PDF-файла онлайн
Текст 3 страницы из PDF
First, the use of theid function as a tag prevents the bloat operator from beingapplied within i f expressions which were themselves justcreated. Second, the shrink operator reverses bloat mutations, and bloat mutations not yielding a decrease in selfreplication cost are strongly selected against during leanepochs.The combined effect on fitness of these two mutation operators is complex. After a pair of bloat and shrink mutations, a more complex source code must be analyzed by amore complex compiler, a change which might (but morelikely will not) pay for itself by an increase in the efficiencyof the generated object code.ShrinkThe rules defining the shrink operator serve two purposes.the first purpose is to reverse mutations introduced by thebloat operator; the fourth shrink rule removes the tagged i fexpressions generated by the bloat operator so that a bloatmutation followed by a shrink mutation (of this type) has nonet effect.
The second purpose is to simplify function applications; the last shrink rule replaces an expression wherea function is applied to one or more values with just oneof those values. Because these rules also remove the identity function tags inserted by the bloat operator, the expression which results from a shrink mutation is again subject tobloating. The five rules which define the shrink operator areCrossoverBecause the self-hosting compiler is a complex lexicallyscoped program, variables which are defined in one scopewill not necessarily be defined in other scopes. If we employed the standard method of non-homologous crossoverused in most work on genetic programming, then subtreescould be inserted into scopes where one or more variablesare undefined, and this would break the compiler.
We address this problem by employing the homologous crossovermethod described by (D’haeseleer, 1994). In this method,the crossover operator descends into both parent trees in parallel; points where the two parent trees differ are subject tocrossover, with the child receiving the subtree of either parent with equal probability. D’haeseleer notes that homologous crossover facilitates convergence (fixation) since children resulting from the crossover of identical parents willalso be identical to the parents.1.
(lambda[+] e1 ) → (lambda[+] e1 )2. ((lambda[+] e1 ) e2 ) → ((lambda[+] e1 ) e2 )3. (if e1 e2 e3 ) → (if e1 e2 e3 )4. (if e1 (id e2 ) e3 ) → (if e1 (id e2 ) e3 ) e2 | e35. ( f e1 . . . eN ) → ( f e1 . . . eN ) e1 | . . . | eNECAL 2013A8551901610001590014800median number of returnsmedian number of non-trivial testsECAL - General Track131211109877006005004003002000102030405060generation708010090Figure 6: The median number (in a population of size 200)of non-trivial test bytecodes averaged over 20 runs (errorbars show plus or minus one standard deviation). Becauseeach non-trivial test bytecode results from a bloat mutationat a distinct point in the ϕ expression, this graph demonstrates that mutation is in no way restricted to the two pointsrelevant to the evolution of tail-call optimization.0102030405060generation708090Figure 7: The median number (in a population of size 200)of return bytecodes executed during self-replication averaged over 20 runs (error bars show plus or minus one standard deviation).genitors in the population.The population is then subjected to crossover using tournament selection.
In each tournament, four individuals arechosen at random (with replacement). The winners of twotournaments are then combined using crossover, and the resulting individual is tested for viability. The crossover operation is repeated until it yields two hundred viable individuals which comprise the population of the next generation.GenotypesFunction applications involving one and two arguments arecompiled at two different points in the ϕ expression andeach of these points is a potential target for a pair of bloatand shrink mutations which would partially implement tailcall optimization.
We call the genotype of programs whichperform neither optimization A, one (or the other) optimization B (or C), and both optimizations, D. Both optimizationscheck to see if the continuation is a return bytecode, whichperforms a frame stack pop. If so, the push-pop sequenceis not generated, resulting in significant time and space savings. See Figure 4.
Lower bounds for the complexity andself-replication cost of each of the four genotypes are shownin Table 1. Finally, the relative fitnesses of the four genotypes are shown graphically, in the context of the fitnesslandscapes for the flush and lean epochs, in Figure 5.The above process is repeated for nine more generations, then the epoch is switched to lean (in which fitness isequated with self-replication efficiency).
The genetic algorithm is run for a total of 100 generations (five flush epochsinterrupted by five lean epochs).In an initial experiment, the system was run twenty times.The median number of interesting test bytecodes containedin the compiled ϕ expression and the median number of return bytecodes executed during self-replication were thenplotted as a function of generation; see Figures 6 and 7. Asexpected, both complexity and self-replication cost increasein flush epochs and decrease in lean epochs.
After 40 generations (two flush-lean cycles), the median complexity at theend of flush epochs is nearly double its initial value, whichmeans that the majority of individuals contain 7 or morepredicates which compile to non-trivial test bytecodes notpresent in the initial population. Furthermore, the mediancomplexity at the end of lean epochs is always 10 or more,which suggests that either 1) the shrink operator is not fullyable to reverse the effects of the bloat operator so that oneor more bloat mutations (on average) survive through leanepochs; or 2) one (or both) of the B and C alleles is fixedin the population. Examination of Figure 7 shows that after40 generations, the median self-replication cost at the endof lean epochs is slightly more than half of its initial value.Experimental ResultsThe initial population consisted of two hundred identical individuals of genotype A at the beginning of a flush epoch (inwhich fitness is equated with effective complexity).
In thefirst step of the genetic algorithm, the bloat and shrink operators are applied to all individuals in the population and themutants which result are tested for viability. To test for viability, the mutant is evaluated to produce a daughter, and thedaughter is evaluated to produce a granddaughter. The mutant is classified as viable if the daughter and granddaughtercontain the same number (greater than zero) of bytecodes(this is done in lieu of a much more expensive test of actualstructural equivalence). Viable mutants replace their pro-91ECAL 2013ECAL - General TrackThis is consistent with evolution of one or both of the B andC genotypes. Self-replication cost continues to increase anddecrease (depending on epoch) eventually reaching a pointwhere the median value at the end of the fifth lean epochis nearly three times smaller than the initial value.
This isconsistent with the evolution of the D genotype.After running the system 100 times, the probabilities ofthe B, C, and D genotypes evolving and for the mutationsbecoming fixed in the population were estimated. See Table2. Notably, the most complex and most efficient genotype,D, evolved within 100 generations 81 times. Additionally,the average and median number of generations required foreach genotype to evolve and for the mutations to becomefixed were also estimated. Considering only the 81 runs inwhich the D genotype evolved, the average number of generations required was approximately 36 and the median number was 29.Table 3: Probabilities of pathways to D genotype.tB < tC = tD0.33probabilitymeanstd.
dev.medianC0.9124.522.013D0.8135.824.529B0.8929.921.117C0.7834.322.233tC < tB < tD0.09Future WorkThis paper describes work that, although preliminary, opensmany avenues for further exploration, including• Determining whether or not a self-replicating programwhich reproduces by compiling itself can evolve the optimum order for the tests comprising the decision treewhich performs syntactic analysis; this would require anew mutation operator which can reorder nested-if expressions.D0.7043.324.536• Determining whether or not it is possible to evolve deadcode elimination, which would be a useful optimizationin a system which includes mutation operators (like bloat)which (in effect) introduce dead code; to accomplish this,the bloat operator would have to generate a much largerset of φ expressions, including dereferencing source codewith car and cdr combinations.If we know the average numbers of individuals of a givengenotype in each generation, then we can compute cumulative distribution functions for evolution and fixation of thatgenotype; see Figure 8.