Ads-portal domains: Identification and measurements
An ads-portal domain refers to a Web domain that shows only advertisements, served by a third-party advertisement syndication service, in the form of ads listing. We develop a machine-learning-based classifier to identify ads-portal domains, which has 96% accuracy. We use this classifier to measure the prevalence of ads-portal domains on the Internet. Surprisingly, 28.3/25% of the (two-level) *.com/*.net web domains are ads-portal domains. Also, 41/39.8% of *.com/ *.net ads-portal domains are typos of well-known domains, also known as typo-squatting domains. In addition, we use the classifier along with DNS trace files to estimate how often Internet users visit ads-portal domains. It turns out that ∼5% of the two-level *.com, *.net, *.org, *.biz and *.info web domains on the traces are ads-portal domains and ∼50% of these accessed ads-portal domains are typos. These numbers show that ads-portal domains and typo-squatting ads-portal domains are prevalent on the Internet and successful in attracting many visits. Our classifier represents a step towards better categorizing the web documents. It can also be helpful to search engines ranking algorithms, helpful in identifying web spams that redirects to ads-portal domains, and used to discourage access to typo-squatting ads-portal domains. © 2010 ACM.
Duke Scholars
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Related Subject Headings
- Information Systems
- 4608 Human-centred computing
- 4606 Distributed computing and systems software
- 4605 Data management and data science
- 0807 Library and Information Studies
- 0806 Information Systems
- 0805 Distributed Computing
Citation
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Related Subject Headings
- Information Systems
- 4608 Human-centred computing
- 4606 Distributed computing and systems software
- 4605 Data management and data science
- 0807 Library and Information Studies
- 0806 Information Systems
- 0805 Distributed Computing