Loading...

On the uniform sampling of the web: An improvement on bucket based sampling

Heidari, S ; Sharif University of Technology | 2009

496 Viewed
  1. Type of Document: Article
  2. DOI: 10.1109/ICCSN.2009.164
  3. Publisher: 2009
  4. Abstract:
  5. Web is one of the biggest sources of information. The tremendous size, the dynamicity, and the structure of the Web have made the information retrieval process of the web a challenging issue. Web Search Engines (WSEs) have started to help users with this matter. However, these types of application, to perform more effectively, always need current information about many characteristics of the Web. To determine these characteristics, one way is to use statistical sampling of the Web pages. In this kind of approaches, instead of analyzing a large number of Web pages, a rather smaller and more uniform set of Web pages is used. This research attempts to analyze the presented methods for generating uniform samples of the pages from the World Wide Web. It specifically focuses on a new method called BBS [7]. Briefly, we improved BBS at least by 4.45% regarding the uniformity of the samples. Using this improved BBS, we estimated the size of the public indexable Web at 27.4 Billion pages. The index sizes of some commercial WSEs are also estimated and compared. © 2009 IEEE
  6. Keywords:
  7. Uniform sampling ; Web ; Web search engine ; Statistical sampling ; Uniform sampling ; Web ; Web page ; Web search engine ; Web search engines ; Bulletin boards ; Information retrieval ; Information services ; Sampling ; Search engines ; World Wide Web
  8. Source: 2009 International Conference on Communication Software and Networks, ICCSN 2009, Macau, 27 February 2009 through 28 February 2009 ; 2009 , Pages 205-209 ; 9780769535227 (ISBN)
  9. URL: https://ieeexplore.ieee.org/document/5076840