2013. An Improving Method for Loop Unrolling, страница 2

Описание файла

PDF-файл из архива "2013. An Improving Method for Loop Unrolling", который расположен в категории "статьи". Всё это находится в предмете "конструирование компиляторов" из седьмого семестра, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .

Просмотр PDF-файла онлайн

Текст 2 страницы из PDF

{13. lp=lp->next;IV. POWER CONSUMPTION, ENERGY USAGE AND SPEED UP-Simulation or measuring. The program code plays aneffective role in power consumption of a processor. So someresearch has been done studying the impact of compileroptimizations on power consumption. Given a particulararchitecture the programs that are run on it will have asignificant influence on the energy usage of the processor. Therelative effect of program behavior on processor energy andpower consumption can be demonstrated in simulation. Butthere are some factors such as clock generation anddistribution, energy leakage, power leakage and etc. that makeit difficult to have an accurate architecture-level simulation togive us enough information about the effect of a program on areal processor [1].

Therefore, we have to measure the effect ofa program on a real processor and not just in simulation.14. Count++;15. }The instructions number 6,7,8,9 forms a basic block, butbecause of data dependencies superscalar processors can notexecute these instructions in parallel. The benefits of thisunrolled loop come from less loop-overhead and not from ILP.So we suggest a new way to solve this problem (that istraversing linked list and counting its nodes).

And we hope thenew method could increase level of parallelism. This is not ageneral solution and just solves this problem; however, thisgives us a new idea of increasing pointers to traverse the listfrom different positions. The solution is as follows.Proposed Solution: We use a two-way linked list whichalso has two pointers named first (pointing to the first node)and last (pointing to the last node). So we have the followingalgorithm:1.F=first;2.L=last;3.Count=0;4.While ((F!=L) || (F->right!=L))5.{6.F=F->right;7.L=L->left;8.Count+=2;9.}-Results. Here we review the results of some experimentsdone to study impact of loop unrolling technique on threefactors: power consumption and energy usage of a superscalarprocessor, and also program speed up.

Seng and Tullsen etal.[1] study the effect of loop unrolling on power consumptionand energy usage. They measure the energy usage and powerconsumption of a 2.0 GHZ Intel Pentium 4 processor. They rundifferent benchmarks compiled with various optimizationsusing the Intel C++ compiler and quantify the energy andpower differences when running different binaries.

Theyconclude that “when applying loop unrolling, there is a slightmeasurable reduction in energy, for little or no effect onperformance. For the binaries where loop unrolling is enabled,the total energy is reduced as well as the power consumption.The difference in terms of energy and power is very small,though.”Mahlke et al. [2] study the effect of loop unrolling as atechnique to reach ILP on supercomputers which containssuperscalar node processors. They reach the result that “withconventional optimization taken as a baseline, loop unrollingand register renaming yields an overall average speed up of 5.1on an issue-8 processor”. The maximum number of instructionsthat an issue-8 processor can fetch and issue per cycle is 8.

Theother result that they’ve reached is that the ILP transformationsincluding loop unrolling increase the register usage of loops.10. If(F=L)11.Count-=1;In this algorithm we encounter two possible states as comesbelow:1.The number of list nodes is odd. In this statewhen the pointers F and L move to the middle of75http://sites.google.com/site/ijcsis/ISSN 1947-5500(IJCSIS) International Journal of Computer Science and Information Security,Vol. 11, No. 5, May 2013V.

CONCLUSIONIn this study we review the ideas mentioned in several otherpapers which talk about compiler optimization techniques.Focusing on loop unrolling and superscalar architecture, wediscuss the idea of generalized loop unrolling presented by J.C.Hang and T. Leng and then we present a new method totraverse a linked list to get a better result of loop unrolling inthat case. After that with comparing and examining ideas wereach some results as follows. Loop unrolling has a slightmeasurable effect on energy usage as well as powerconsumption by which no huge change in performance wouldoccur. But it could be an effective method for program speedup.

An important issue is that the loop unrolling techniquegenerally won’t bring the expected performance to theprograms without other optimization techniques such asregister renaming. These results have been gained by usingmeasuring technique accompanying simulation technique.I.Meisam Booshehri was born in Iran. He received hisMaster Degree in Software Engineering from IAUN in2012. Currently, he is a lecturer at Payame NoorUniversity (PNU), Iran. He is also a member of YoungResearchers Club, Sepidan Branch, Islamic AzadUniversity, Sepidan, Iran.

His research interests includeparallel and distributed computing, Compilers andSemantic Web.Email: m_booshehri@sco.iaun.ac.irAbbas Malekpour* is currently an AssistantProfessor in the Institute of Distributed HighPerformance Computing at University of Rostock. Hereceived his Master Degree from Stuttgart Universityand his Ph.D. degree from University of Rostock,Germany. From 2002 to 2004 he was with Institute ofTelematics Research Group at university ofKarlsruhe, Germany. And from 2004 to 2010, hewas a research assistant in MICON Electronics andTelecommunications Research Institute at Universityof Rostock, Germany.

His current research interests include the areas ofMobile and Concurrent Multi-path Communication prototyping.* Corresponding Author at: Chair of Distributed High PerformanceComputing, Institute of Computer Science, University of Rostock, Rostock,GermanyEmail: abbas.malekpour@uni-rostock.deVI. FUTURE WORKAdditional work that we would like to perform would be tochange existing algorithms which works on data structures likelinked list or present some new ones to reduce the probabilityof occurring hazards (like read after write hazards) that forcethe compilers to shorten the size of basic blocks and then notusing the superscalar processors’ ability, effectively.

In otherwords, we want to optimize the way of writing code for datastructures to reach some standard rules of programming whichresult in using superscalar architecture, effectively. Or we cangive this task to compilers (and not programmers) to use somestandard rules in code transformations.

Or we may reach atradeoff between programmers and compilers to use somestandard rules. Another thing that we guess is that the ruleswhich we want to use may conflict some software engineeringconsiderations in programming. So another trade off also isneeded here.Peter Luksch finished his study in computer scienceand received his Ph.D. degree in Parallel DiscreteEventSimulationonDistributedMemoryMultiprocessors from TechnischeUniversitätMünchen, Germany, in 1993. Currently, he is aProfessor at University of Rostock and Head of theChair of Distributed High Performance Computing.During the years 1993 to 2003 he was a SeniorResearch Assistant and Lecturer at LRR-TUM atTUM. He finished his Postdoctoral LectureQualification (Habilitation) in Increased Productivityin Computational Prototyping with the Help ofParallel and Distributed Computing in 2000.

His current research topicsinclude parallel and distributed computing and computational prototyping.Email: peter.luksch@uni-rostock.deREFERENCES[1][2][3][4]Authors’ informationJohn S. Seng, Dean M. Tullsen, “The effect of compiler optimizationson Pentium 4 power consumption”, in Proceedings of the 7th workshopon Interaction between compilers and compiler architecture, 2003 IEEE.Scott A. Mahlke, William Y. Chen, John C. Gyllenhall, wen-meiW.Hwu, pohua P.

Chang, Tokuza Kiyohara, “Compiler CodeTransformations for Superscalar-Based High-Performance Systems”, inProceeding of Supercomputing ,1992.J.C. Hang and T. Leng, “Generalized Loop-Unrolling: a method forprogram speed up” , the university of Houston. in Proc. IEEE Symp. onApplication-Specific Systems and Software Engineering andTechnology, 1999.John L. Hennessy; David A. Patterson, “Computer Architecture AQuantative Aproach”, 2nd Edition,1995.76http://sites.google.com/site/ijcsis/ISSN 1947-5500.

Свежие статьи
Популярно сейчас