Skip to main content
Journal cover image

Trails of Data: Three Cases for Collecting Web Information for Social Science Research

Publication ,  Journal Article
Li, F; Zhou, Y; Cai, T
Published in: Social Science Computer Review
October 2021

As the availability of online data grows rapidly, researchers are confronted with a pressing question: How should social scientists collect Internet data for research? This study focuses on one of the most commonly used data collection techniques: web scraping. Going beyond canned approaches by leveraging a general framework of data communication, this study illustrates how online information can be systematically queried and fetched for reproducible research. To generalize our approaches, we additionally explore the variations in site security and architecture that analysts may encounter during the scraping process before they are given access to the desired data. The approaches we introduce do not rely on any proprietary software and can be easily implemented on any computing platform with programming languages such as Python or R. The methodological discussion in this study is meant to be applicable to current web-based research efforts. We include three examples with complete Python implementation. We also present an integrated workflow that enables researchers to produce analytical data sets that are traceable and thus verifiable for analysis or replication. Lastly, options related to the validity and efficiency of data are discussed, and we highlight the ongoing debate surrounding the ethics of online data collection, ultimately advocating for the fair use of online data.

Duke Scholars

Published In

Social Science Computer Review

DOI

EISSN

1552-8286

ISSN

0894-4393

Publication Date

October 2021

Volume

39

Issue

5

Start / End Page

922 / 942

Publisher

SAGE Publications

Related Subject Headings

  • General Arts, Humanities & Social Sciences
  • 46 Information and computing sciences
  • 0899 Other Information and Computing Sciences
  • 0807 Library and Information Studies
  • 0806 Information Systems
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Li, F., Zhou, Y., & Cai, T. (2021). Trails of Data: Three Cases for Collecting Web Information for Social Science Research. Social Science Computer Review, 39(5), 922–942. https://doi.org/10.1177/0894439319886019
Li, Fumin, Yisu Zhou, and Tianji Cai. “Trails of Data: Three Cases for Collecting Web Information for Social Science Research.” Social Science Computer Review 39, no. 5 (October 2021): 922–42. https://doi.org/10.1177/0894439319886019.
Li F, Zhou Y, Cai T. Trails of Data: Three Cases for Collecting Web Information for Social Science Research. Social Science Computer Review. 2021 Oct;39(5):922–42.
Li, Fumin, et al. “Trails of Data: Three Cases for Collecting Web Information for Social Science Research.” Social Science Computer Review, vol. 39, no. 5, SAGE Publications, Oct. 2021, pp. 922–42. Crossref, doi:10.1177/0894439319886019.
Li F, Zhou Y, Cai T. Trails of Data: Three Cases for Collecting Web Information for Social Science Research. Social Science Computer Review. SAGE Publications; 2021 Oct;39(5):922–942.
Journal cover image

Published In

Social Science Computer Review

DOI

EISSN

1552-8286

ISSN

0894-4393

Publication Date

October 2021

Volume

39

Issue

5

Start / End Page

922 / 942

Publisher

SAGE Publications

Related Subject Headings

  • General Arts, Humanities & Social Sciences
  • 46 Information and computing sciences
  • 0899 Other Information and Computing Sciences
  • 0807 Library and Information Studies
  • 0806 Information Systems