Scholars@Duke publication: Web Scraping in the Statistics and Data Science Curriculum: Challenges and Opportunities

Web Scraping in the Statistics and Data Science Curriculum: Challenges and Opportunities

Publication , Journal Article

Dogucu, M; Çetinkaya-Rundel, M

Published in: Journal of Statistics and Data Science Education

January 1, 2021

Published version (DOI) Open Access Copy (Duke)

Best practices in statistics and data science courses include the use of real and relevant data as well as teaching the entire data science cycle starting with importing data. A rich source of real and current data is the web, where data are often presented and stored in a structure that needs some wrangling and transforming before they can be ready for analysis. The web is a resource students naturally turn to for finding data for data analysis projects, but without formal instruction on how to get that data into a structured format, they often resort to copy-pasting or manual entry into a spreadsheet, which are both time consuming and error-prone. Teaching web scraping provides an opportunity to bring such data into the curriculum in an effective and efficient way. In this article, we explain how web scraping works and how it can be implemented in a pedagogically sound and technically executable way at various levels of statistics and data science curricula. We provide classroom activities where we connect this modern computing technique with traditional statistical topics. Finally, we share the opportunities web scraping brings to the classrooms as well as the challenges to instructors and tips for avoiding them.

Duke Scholars

Author Mine Cetinkaya-Rundel Statistical Science

Published In

Journal of Statistics and Data Science Education

DOI

10.1080/10691898.2020.1787116

EISSN

2693-9169

Publication Date

January 1, 2021

Volume

Issue

Start / End Page

S112 / S122

Citation

APA

Chicago

ICMJE

MLA

NLM

Dogucu, M., & Çetinkaya-Rundel, M. (2021). Web Scraping in the Statistics and Data Science Curriculum: Challenges and Opportunities. Journal of Statistics and Data Science Education, 29(S1), S112–S122. https://doi.org/10.1080/10691898.2020.1787116

Dogucu, M., and M. Çetinkaya-Rundel. “Web Scraping in the Statistics and Data Science Curriculum: Challenges and Opportunities.” Journal of Statistics and Data Science Education 29, no. S1 (January 1, 2021): S112–22. https://doi.org/10.1080/10691898.2020.1787116.

Dogucu M, Çetinkaya-Rundel M. Web Scraping in the Statistics and Data Science Curriculum: Challenges and Opportunities. Journal of Statistics and Data Science Education. 2021 Jan 1;29(S1):S112–22.

Dogucu, M., and M. Çetinkaya-Rundel. “Web Scraping in the Statistics and Data Science Curriculum: Challenges and Opportunities.” Journal of Statistics and Data Science Education, vol. 29, no. S1, Jan. 2021, pp. S112–22. Scopus, doi:10.1080/10691898.2020.1787116.

Dogucu M, Çetinkaya-Rundel M. Web Scraping in the Statistics and Data Science Curriculum: Challenges and Opportunities. Journal of Statistics and Data Science Education. 2021 Jan 1;29(S1):S112–S122.

Published In

Journal of Statistics and Data Science Education

DOI

10.1080/10691898.2020.1787116

EISSN

2693-9169

Publication Date

January 1, 2021

Volume

Issue

Start / End Page

S112 / S122