Управление дисковой памятью системы хранения данных LHCb а основе прогноза популярности данных (1187433), страница 5
Текст из файла (страница 5)
Âòîðîå ñëàãàåìîå âûðàæàåò âðåìÿ, íåîáõîäèìî äëÿ âîññòàíîâëåíèÿ òåõ ôàéëîâ ñ ìàãíèòíûõ ëåíò, êîòîðûå áûëè óäàëåíû ñäèñêà èç-çà îøèáêè àëãîðèòìà. Òðåòüå ñëàãàåìîå - âðåìÿ, íåîáõîäèìîå äëÿ çàãðóçêèâîññòàíîâëåííûõ ôàéëîâ âñåìè ïîëüçîâàòåëÿìè. Ïåðâûå 78 íåäåëü èñòîðèè îáðàùåíèé ê ôàéëàì èñïîëüçóþòñÿ êàê âõîäíûå äàííûå àëãîðèòìîâ. Ïîñëåäíèå 26 íåäåëüèñïîëüçóþòñÿ äëÿ èçìåðåíèÿ êà÷åñòâà àëãîðèòìîâ è äëÿ îïðåäåëåíèÿ ÷èñëà îáðàùåíèé (çàãðóçîê) ê ôàéëàì.206.3ÐåçóëüòàòûÔàéëû, êîòîðûå áûëè ñîçäàíû è âïåðâûå èñïîëüçîâàíû ðàíåå 78-îé íåäåëè èñïîëüçóþòñÿ äëÿ ñðàâíåíèÿ àëãîðèòìîâ. 7375 ôàéëîâ ó÷àñòâîâàëè â ñðàâíåíèè. Ñëåäóþùèåçíà÷åíèÿ ïàðàìåòðîâ èñïîëüçîâàëèñü äëÿ îïòèìèçàöèè ôóíêöèè ïîòåðü: Cdisk = 100,Ctape = 1, Cmiss = 2000.
Äëÿ âûðàæåíèÿ âðåìåíè çàãðóçêè èñïîëüçîâàëèñü ñëåäóþùèå çíà÷åíèÿ ïàðàìåòðîâ: tdisk = 0.1 ÷àñà/Ãá, ttape = 3 ÷àñà/Ãá and Ktape = 24 ÷àñîâ.Çíà÷åíèÿ ýòèõ ïàðàìåòðîâ îòðàæàþò èäåè òîãî, ÷òî îáúåì æåñòêèõ äèñêîâ îãðàíè÷åí, âîññòàíîâëåíèå ôàéëà ñ ìàãíèòíîé ëåíòû íà äèñê òðåáóåò áîëüøîãî êîëè÷åñòâàâðåìåíè, ÷èñëî îøèáîê àëãîðèòìà äîëæíî áûòü ìèíèìàëüíûì.Òàáëèöû 2 è 3 îòîáðàæàþò ðåçóëüòàòû ñðàâíåíèÿ íàøåé ðåêîìåíäàòåëüíîé ñèñòåìû ñ ìàêñèìàëüíûì ÷èñëîì êîïèé ôàéëà ðàâíûì 4 è LRU àëãîðèòìà. Îòíîøåíèåâðåìåíè äîñòóïà - ýòî îòíîøåíèå âðåìåíè äîñòóïà (çàãðóçêè) ê ôàéëàì ïîñëå ïðèìåíåíèÿ àëãîðèòìà ê ïåðâîíà÷àëüíîìó âðåìåíè äîñòóïà. Êîëîíêà Ýêîíîìèÿ ìåñòàïîêàçûâàåò ñêîëüêî ìåñòà íà æåñòêèõ äèñêàõ ìîæíî ñýêîíîìèòü ñ ïîìîùüþ àëãîðèòìà.
Êîëîíêà ×èñëî îøèáîê ïîêàçûâàåò ÷èñëî ôàéëîâ, êîòîðûå áûëè óäàëåíû ñæåñòêîãî äèñêà, íî çàòåì áûëè èñïîëüçîâàíû.Îáà àëãîðèòìà ïîçâîëÿþò ñîõðàíèòü ñðàâíèìûé îáúåì äèñêîâîãî ïðîñòðàíñòâà.Îäíàêî, íàøà ðåêîìåíäàòåëüíàÿ ñèñòåìà äîïóñêàåò ãîðàçäî ìåíüøå îøèáîê. Òàêæåòàáëèöû ïîêàçûâàþò, ÷òî íàøà ñèñòåìà ñ ìàêñèìàëüíûì ÷èñëîì êîïèé äëÿ ôàéëàðàâíûì 4 ïîçâîëÿåò íåçíà÷èòåëüíî ñíèçèòü âðåìÿ äîñòóïà ê äàííûì.Òàáëèöà 2: Ðåçóëüòàòû äëÿ LRU àëãîðèòìà.N12510152025Îòíîøåíèå âðåìåíè äîñòóïà1.331.281.41.111.071.031.02Ýêîíîìèÿ ìåñòà, %63585044383330×èñëî îøèáîê197316591357966635370193Òàáëèöà 3: Ðåçóëüòàòû íàøåé ñèñòåìû ñ ìàêñèìàëüíûì ÷èñëîì êîïèé ðàâíûì 4.Àëüôà00.010.050.10.512Îòíîøåíèå âðåìåíè äîñòóïà Ýêîíîìèÿ ìåñòà, %3.35710.99460.96340.96300.96230.96190.9616×èñëî îøèáîê9999999Òàáëèöà 4 ïîêàçûâàåò, ÷òî ïðè ìàêñèìàëüíîì ÷èñëå êîïèé ðàâíûì 7 íàøà ñèñòåìà ïîìîãàåò ñýêîíîìèòü äî 40% îáúåìà æåñòêîãî äèñêà è ñíèçèòü âðåìÿ çàãðóçêèôàéëîâ íà 30%.21Òàáëèöà 4: Ðåçóëüòàòû íàøåé ñèñòåìû ñ ìàêñèìàëüíûì ÷èñëîì êîïèé ðàâíûì 7.Àëüôà00.0010.0050.010.050.1Îòíîøåíèå âðåìåíè äîñòóïà Ýêîíîìèÿ ìåñòà, %3.35711.03570.72400.68340.63110.621×èñëî îøèáîê888888Àíàëèç âûïîëíåí ñ ïðèìåíåíèåì Reproducible Experiment Platform[15] - ïëàòôîðìû äëÿ ðåøåíèÿ çàäà÷ àíàëèçà äàííûõ.7ÁèáëèîòåêàÏðåäñòàâëåííàÿ çäåñü ðåêîìåíäàòåëüíàÿ ñèñòåìà ðåàëèçîâàíà â âèäå áèáëèîòåêè íàÿçûêå Python.
Áèáëèîòåêà íàçûâàåòñÿ datapop [13] è ìîæåò áûòü çàãðóæåíà èç ðåïîçèòîðèÿ https://github.com/hushchyn-mikhail/DataPopularity ñ ïîäðîáíîé èíñòðóêöèåé ïî èñïîëüçîâàíèþ.8ÑåðâèñÒàêæå ðàçðàáîòàí âåá-ñåðâèñ íàøåé ðåêîìåíäàòåëüíîé ñèñòåìû. Ñåðâèñ datapopserv [13]íàïèñàí íà ÿçûêå Python ñ èñïîëüçîâàíèåì áèáëèîòåêè ask. Ñåðâèñ ìîæíî çàïóñòèòü íà ëîêàëüíîé ìàøèíå â âèäå äîêåð êîíòåéíåðà[14].  òàêîì ñëó÷àå íàëè÷èåPython è åãî áèáëèîòåê íà ëîêàëüíîé ìàøèíå íå ïîòðåáóåòñÿ.
Ïîëüçîâàòüñÿ ñåðâåðîì ìîæíî ÷åðåç http çàïðîñû.Êðîìå òîãî, íà ÿçûêå Python ðàçðàáîòàíà êëèåíòñêàÿ ÷àñòü äëÿ ñåðâåðà - datapopclient [14].Datapopclient ïðåäîñòàâëÿåò óäîáíûé èíòåðôåéñ äëÿ ïîëüçîâàòåëåé ñåðâèñà.Ñåðâèñ, ïîäðîáíóþ èíñòðóêöèþ ïî çàïóñêó ñåðâèñà è åãî èñïîëüçîâàíèþ, à òàêæå êëèåíòñêóþ ÷àòü ìîæíî çàãðóçèòü èç ðåïîçèòîðèÿ https://github.com/hushchynmikhail/DataPopularity.9Äîêëàäû è ïóáëèêàöèèÏðåäñòàâëåííàÿ â äàííîé ðàáîòå ðåêîìåíäàòåëüíàÿ ñèñòåìà ïðåçåíòîâàëàñü íà óñòíûõ äîêëàäàõ ñëåäóþùèõ êîíôåðåíöèé:• 57-àÿ íàó÷íàÿ êîíôåðåíöèÿ ÌÔÒÈ, "Îïòèìèçàöèÿ ñèñòåìû ïîïóëÿðíîñòè ôàé-ëîâ â ýêñïåðèìåíòàõ ôèçèêè âûñîêèõ ýíåðãèé".• 21st International Conference on Computing in High Energy and Nuclear Physics(CHEP2015), "Disk storage management for LHCb based on Data Popularity estimator".Òàêæå ðåçóëüòàòû ïóáëèêîâàëèñü â ñáîðíèêàõ ñîîòâåòñòâóþùèõ êîíôåðåíöèé.2210Ïðàêòè÷åñêîå ïðèìåíåíèå íàñòîÿùèé ìîìåíò ðåêîìåíäàòåëüíàÿ ñèñòåìà ïðîõîäèò òåñòèðîâàíèå â LHCb.
Ñèñòåìà è åå âåá-ñåðâèñ áóäóò ïîääåðæèâàòüñÿ è óëó÷øàòüñÿ â äàëüíåéøåì ñ öåëüþèñïîëüçîâàíèÿ íàøåé ðåêîìåíäàòåëüíîé ñèñòåìû äëÿ óïðàâëåíèÿ äèñêîâîé ïàìÿòüþñèñòåìû õðàíåíèÿ äàííûõ LHCb.11Çàêëþ÷åíèå äàííîé ðàáîòå ïðåäñòàâëåíà ðåêîìåíäàòåëüíàÿ ñèñòåìà äëÿ óïðàâëåíèÿ äèñêîâîéïàìÿòüþ â ãèáðèäíîé ñèñòåìå õðàíåíèÿ äàííûõ (æåñòêèå äèñêè è ìàãíèòíûå ëåíòû).Áûëî ïîêàçàíî, ÷òî íàø ìåòîä ïîçâîëÿåò ïîëó÷èòü êðàéíå íèçêîå ÷èëî îøèáî÷íûõóäàëåíèé ôàéëîâ ñ æåñòêîãî äèñêà. Ðåçóëüòàòû ïîêàçûâàþò, ÷òî íàøà ðåêîìåíäàòåëüíàÿ ñèñòåìà ïîçâîëÿåò äîáèòüñÿ ñóùåñòâåííîé ýêîíîìèè äèñêîâîãî ïðîñòðàíñòâàè çíà÷èòåëüíî ñíèçèòü ñðåäíåå âðåìÿ äîñòóïà ê äàííûì.Ñïèñîê ëèòåðàòóðû[1] Hastie T., Tibshirani R., Friedman J.
2009 The Elements of Statistical Learning(Berlin: Springer)[2] Hyndman R., Athanasopoulos G.(https://www.otexts.org/book/fpp)Forecasting:principlesandpractice[3] Lipeng W., Zheng L., Qing C., Feiyi W., Sarp O., Bradley S. 2014 30th Symposium onMass Storage Systems and Technologies (MSST): SSD-optimized workload placementwith adaptive learning and classication in HPC environments (California: IEEE)[4] Beermann T., Stewart A., Maettig P.
2014 The International Symposium on Grids andClouds (ISGC) 2014: A Popularity-Based Prediction and Data Redistribution Tool forATLAS Distributed Data Management (PoS) p 4[5] Beermann T. 2013 Popularity Prediction Tool for ATLAS Distributed DataManagement: J. of Phys.: Conf. Ser.
513 (2014) 042004 (IOP Publishing)[6] Jamali S., Rangwala H., Digging Digg: Comment Mining, Popularity Prediction, andSocial Network Analysis (http://cs.gmu.edu/ hrangwal/sites/default/les/GMU-CSTR-2009-7.pdf)[7] Quan H., Milicic A., Vucetic S., and Wu J. A ConnectivityBasedPopularityPredictionApproachforSocialNetworks(http://www.dabi.temple.edu/ vucetic/documents/Quan12icc.pdf)[8] GuptaM.,GaoJ.,ZhaiC.,HanJ.2012FuturePopularityTrendofEventsinMicroblogging(https://www.asis.org/asist2012/proceedings/Submissions/207.pdf)PredictingPlatforms[9] Li J., Hong S., Xia S. 2012 Neural Network Based Popularity Prediction For IPTVSystem (J. of Networks, vol.
7, No. 12, Dec. 2012)[10] Li H., Ma X., Wang F., Liu J., Xu K. 2013 OnPredictionofVideosSharedinOnlineSocial(http://www.cs.sfu.ca/ jcliu/Papers/OnPopularityPrediction.pdf)23PopularityNetworks[11] Anonymous Author 2014 Predict the Popularity of YouTube Videos Using Early ViewData (http://www.cs.ubc.ca/ nando/540-2013/projects/p16.pdf)[12] Figueiredo F. 2013 On the Prediction of Popularity of Trends and Hits for UserGenerated Videos (http://homepages.dcc.ufmg.br/ aviov/papers/gueiredo2013wsdmdoc.pdf)[13] Python module andmikhail/DataPopularitywebserviceURLhttps://github.com/hushchyn-[14] Docker URL https://www.docker.com[15] Reproducible Experiment Platform (REP) URL https://github.com/yandex/rep24.