Crawling AJAX by Inferring User Interface State Changes (2008) (1176906)

Файл №1176906 Crawling AJAX by Inferring User Interface State Changes (2008) (тематика web-краулеров)Crawling AJAX by Inferring User Interface State Changes (2008) (1176906)2020-08-172020-08-17СтудИзба

тематика web-краулеров

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла

Delft University of TechnologySoftware Engineering Research GroupTechnical Report SeriesCrawling AJAX by Inferring UserInterface State ChangesAli Mesbah, Engin Bozdag, and Arie van DeursenReport TUD-SERG-2008-022SERGTUD-SERG-2008-022Published, produced and distributed by:Software Engineering Research GroupDepartment of Software TechnologyFaculty of Electrical Engineering, Mathematics and Computer ScienceDelft University of TechnologyMekelweg 42628 CD DelftThe NetherlandsISSN 1872-5392Software Engineering Research Group Technical Reports:http://www.se.ewi.tudelft.nl/techreports/For more information about the Software Engineering Research Group:http://www.se.ewi.tudelft.nl/Note: This paper is a pre-print of:Ali Mesbah, Engin Bozdag, and Arie van Deursen.

Crawling Ajax by Inferring User Interface StateChanges. In Proceedings of the 8th International Conference on Web Engineering (ICWE’08), NewYork, USA. IEEE Computer Society.c copyright 2008, by the authors of this report. Software Engineering Research Group, Department ofSoftware Technology, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology.

All rights reserved. No part of this series may be reproduced in any form or by anymeans without prior written permission of the authors.SERGMesbah et. al. – Crawling AJAX by Inferring User Interface State ChangesCrawling AJAX by Inferring User Interface State ChangesAli MesbahEngin BozdagArie van DeursenDelft University of TechnologyThe NetherlandsDelft University of TechnologyThe NetherlandsDelft Univ. of Technology & CWIThe NetherlandsA.Mesbah@tudelft.nlV.e.Bozdag@tudelft.nlArie.vanDeursen@tudelft.nlAbstractA JAX is a very promising approach for improving richinteractivity and responsiveness of web applications.

At thesame time, A JAX techniques shatter the metaphor of a web‘page’ upon which general search crawlers are based. Thispaper describes a novel technique for crawling A JAX applications through dynamic analysis and reconstruction ofuser interface state changes. Our method dynamically infers a ‘state-flow graph’ modeling the various navigationpaths and states within an A JAX application. This reconstructed model can be used to generate linked static pages.These pages could be used to expose A JAX sites to general search engines. Moreover, we believe that the crawlingtechniques that are part of our solution have other applications, such as within general search engines, accessibility improvements, or in automatically exercising all userinterface elements and conducting state-based testing ofA JAX applications.

We present our open source tool calledC RAWLJAX which implements the concepts discussed inthis paper. Additionally, we report a case study in whichwe apply our approach to a number of representative A JAXapplications and elaborate on the obtained results.1 IntroductionThe web as we know it is undergoing a significantchange. A technology that has gained a prominent position lately, under the umbrella of Web 2.0, is A JAX (Asynchronous JavaScript and XML) [13], in which a clever combination of JavaScript and Document Object Model (DOM)manipulation, along with asynchronous server communication is used to achieve a high level of user interactivity. Highly visible examples include Google Maps, GoogleDocuments, and the recent version of Yahoo! Mail.With this new change in developing web applicationscomes a whole set of new challenges, mainly due to thefact that A JAX shatters the metaphor of a web ‘page’ uponwhich many web technologies are based.

Among thesechallenges are the following:TUD-SERG-2008-022Searchability ensuring that A JAX sites are indexed by thegeneral search engines, instead of (as is currently oftenthe case) being ignored by them because of the use ofclient-side scripting and dynamic state changes in theDOM;Testability systematically exercising dynamic user interface (UI) elements and states of A JAX to find abnormalities and errors;Accessibility examining whether all states of an A JAX sitemeet certain accessibility requirements.One way to address these challenges is through the useof a crawler that can automatically walk through differentstates of a highly dynamic A JAX site, create a model ofthe navigational paths and states, and generate a traditionallinked page-based static version.

The generated static pagescan be used, for instance, to expose A JAX sites to generalsearch engines or to examine the accessibility [2] of different dynamic states. Such a crawler can also be used for conducting state-based testing of A JAX applications [16] andautomatically exercising all user interface elements of anA JAX site in order to find e.g., link-coverage, broken-links,and other errors.To date, no crawler exists that can handle the complexclient code that is present in A JAX applications.

The reasonfor this is that crawling A JAX is fundamentally more difficult than crawling classical multi-page web applications. Intraditional web applications, states are explicit, and correspond to pages that have a unique URL assigned to them.In A JAX applications, however, the state of the user interface is determined dynamically, through changes in theDOM that are only visible after executing the corresponding JavaScript code.In this paper, we propose an approach to analyze andreconstruct these user interface states automatically.

Ourapproach is based on a crawler that can exercise clientside code, and can identify clickable elements (whichmay change with every click) that change the state withinthe browser’s dynamically built DOM. From these state1SERGMesbah et. al. – Crawling AJAX by Inferring User Interface State Changeschanges, we infer a state-flow graph, which captures thestates of the user interface, and the possible transitions between them. This graph can subsequently be used to generate a multi-page static version of the original A JAX application.The underlying ideas have been implemented in a toolcalled C RAWLJAX.1We have performed an experiment of running our crawling framework over a number of representative A JAX sitesto analyze the overall performance of our approach, evaluate the effectiveness in retrieving relevant clickables, assessthe quality and correctness of the detected states and generated static pages, and examine the capability of our toolon real sites used in practice and the scalability in crawlingsites with thousands of dynamic states and clickables.

Thecases span from internal to academic and external commercial A JAX web sites.The paper is structured as follows. We start out, in Section 2 by exploring the difficulties of crawling and indexingA JAX. In Sections 3 and 4, we present a detailed discussionof our new crawling techniques, the generation process, andthe C RAWLJAX tool. In Section 5 the results of applying ourmethods to a number of A JAX applications are shown, after which Section 6 discusses the findings and open issues.Section 7 presents various applications of our crawling techniques. We conclude with a brief survey of related work, asummary of our key contributions, and suggestions for future work.1234567891011<a href=" javascript: OpenNewsPage(); "><a href="#" onClick=" OpenNewsPage(); "><div onClick=" OpenNewsPage();" ><a href=" news .

html " class=" news " ><input type=" submit " class=" news "/ ><div class=" news " ><! -- jQuery function attaching events to elementshaving attribute class= " news " -- >$(". news " ). click ( function() {$("# content"). load (" news . html " );});Figure 1. Different ways of attaching eventsto elements.2.2 State Changes & NavigationTraditional web applications are based on the multi-pageinterface paradigm consisting of multiple (dynamically generated) unique pages each having a unique URL.

In A JAXapplications, not every state change necessarily has an associated R EST-based [11] URI [20]. Ultimately, an A JAXapplication could consist of a single-page [19] with a singleURL. This characteristic makes it very difficult for a searchengine to index and point to a specific state on an A JAXapplication. For crawlers, navigating through traditionalmulti-page web applications has been as easy as extracting and following the hypertext links (or the src attribute)on each page. In A JAX, hypertext links can be replacedby events which are handled by the client engine; it is notpossible any longer to navigate the application by simplyextracting and retrieving the internal hypertext links.2 Challenges of Crawling A JAX2.3 Dynamic Document Object Model (DOM)A JAX has a number of properties making it extremelydifficult for, e.g., search engines to crawl such web applications.2.1 Client-side ExecutionThe common ground for all A JAX applications is aJavaScript engine which operates between the browser andthe web server, and which acts as an extension to thebrowser.

This engine typically deals with server communication and user interface rendering. Any search engine willing to approach such an application must have support forthe execution of the scripting language. Equipping a general search crawler with the necessary environment complicates its design and implementation considerably. Themajor search giants such as Google2 currently have little orno support for executing JavaScript due to scalability andsecurity issues.1 Thetool is available for download from http://spci.st.ewi.tudelft.nl/crawljax/.2http://googlewebmastercentral.blogspot.com/2007/11/spiders-view-of-web-20.html2Crawling and indexing traditional web applications consists of following links, retrieving and saving the HTMLsource code of each page.

The state changes in A JAX applications are dynamically represented through the run-timechanges on the DOM. This means that the source code inHTML does not represent the state anymore. Any searchengine aimed at crawling and indexing such applications,will need to have access to this run-time dynamic documentobject model of the application.2.4 Delta-communicationA JAX applications rely on a delta-communication [20]style of interaction in which merely the state changes are exchanged asynchronously between the client and the server,as opposed to the full-page retrieval approach in traditionalweb applications.

Retrieving and indexing the delta statechanges, for instance, through a proxy between the clientand the server, could have the side-effect of losing the context and actual meaning of the changes. Most of such deltaupdates become meaningful after they have been processedTUD-SERG-2008-022SERGMesbah et. al. – Crawling AJAX by Inferring User Interface State Changesby the JavaScript engine on the client and injected into theDOM.<onclick, xpath://DIV[1]/SPAN[4]>To illustrate the difficulties involved in crawling A JAX,consider Figure 1.

It is a highly simplified example, showing different ways in which a news page can be opened.The example code shows how in A JAX sites, it is notjust the hypertext link element that forms the doorwayto the next state. Note the way events (e.g., onClick,onMouseOver) can be attached to DOM elements at runtime.

Характеристики

Тип файла

PDF-файл

Размер

236,19 Kb

Материал

тематика web-краулеров

Тип материала

Реферат

Предмет

Английский язык

Высшее учебное заведение

МГУ им. Ломоносова

Тип файла PDF

PDF-формат наиболее широко используется для просмотра любого типа файлов на любом устройстве. В него можно сохранить документ, таблицы, презентацию, текст, чертежи, вычисления, графики и всё остальное, что можно показать на экране любого устройства. Именно его лучше всего использовать для печати.

Например, если Вам нужно распечатать чертёж из автокада, Вы сохраните чертёж на флешку, но будет ли автокад в пункте печати? А если будет, то нужная версия с нужными библиотеками? Именно для этого и нужен формат PDF - в нём точно будет показано верно вне зависимости от того, в какой программе создали PDF-файл и есть ли нужная программа для его просмотра.

Список файлов реферата

tematika-web-kraulerov.rar

тематика web-краулеров

An Adaptive Crawler ... перевод 4000 знаков.docx

An Adaptive Crawler for Locating Hidden-Web Entry Points (2007).pdf

Crawling AJAX ... перевод 5000 знаков.docx

Crawling AJAX by Inferring User Interface State Changes (2008).pdf

Задание.txt

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.