Ads-portal domains: Identification and measurements

Journal Article (Journal Article)

An ads-portal domain refers to a Web domain that shows only advertisements, served by a third-party advertisement syndication service, in the form of ads listing. We develop a machine-learning-based classifier to identify ads-portal domains, which has 96% accuracy. We use this classifier to measure the prevalence of ads-portal domains on the Internet. Surprisingly, 28.3/25% of the (two-level) *.com/*.net web domains are ads-portal domains. Also, 41/39.8% of *.com/ *.net ads-portal domains are typos of well-known domains, also known as typo-squatting domains. In addition, we use the classifier along with DNS trace files to estimate how often Internet users visit ads-portal domains. It turns out that ∼5% of the two-level *.com, *.net, *.org, *.biz and *.info web domains on the traces are ads-portal domains and ∼50% of these accessed ads-portal domains are typos. These numbers show that ads-portal domains and typo-squatting ads-portal domains are prevalent on the Internet and successful in attracting many visits. Our classifier represents a step towards better categorizing the web documents. It can also be helpful to search engines ranking algorithms, helpful in identifying web spams that redirects to ads-portal domains, and used to discourage access to typo-squatting ads-portal domains. © 2010 ACM.

Full Text

Duke Authors

Cited Authors

  • Almishari, M; Yang, X

Published Date

  • April 1, 2010

Published In

Volume / Issue

  • 4 / 2

Electronic International Standard Serial Number (EISSN)

  • 1559-114X

International Standard Serial Number (ISSN)

  • 1559-1131

Digital Object Identifier (DOI)

  • 10.1145/1734200.1734201

Citation Source

  • Scopus