Skip to main content
Journal cover image

Lost in a random forest: Using Big Data to study rare events

Publication ,  Journal Article
Bail, CA
Published in: Big Data and Society
December 27, 2015

Sudden, broad-scale shifts in public opinion about social problems are relatively rare. Until recently, social scientists were forced to conduct post-hoc case studies of such unusual events that ignore the broader universe of possible shifts in public opinion that do not materialize. The vast amount of data that has recently become available via social media sites such as Facebook and Twitter—as well as the mass-digitization of qualitative archives provide an unprecedented opportunity for scholars to avoid such selection on the dependent variable. Yet the sheer scale of these new data creates a new set of methodological challenges. Conventional linear models, for example, minimize the influence of rare events as “outliers”—especially within analyses of large samples. While more advanced regression models exist to analyze outliers, they suffer from an even more daunting challenge: equifinality, or the likelihood that rare events may occur via different causal pathways. I discuss a variety of possible solutions to these problems—including recent advances in fuzzy set theory and machine learning—but ultimately advocate an ecumenical approach that combines multiple techniques in iterative fashion.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Big Data and Society

DOI

EISSN

2053-9517

Publication Date

December 27, 2015

Volume

2

Issue

2

Related Subject Headings

  • 4701 Communication and media studies
  • 4406 Human geography
  • 2001 Communication and Media Studies
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Bail, C. A. (2015). Lost in a random forest: Using Big Data to study rare events. Big Data and Society, 2(2). https://doi.org/10.1177/2053951715604333
Bail, C. A. “Lost in a random forest: Using Big Data to study rare events.” Big Data and Society 2, no. 2 (December 27, 2015). https://doi.org/10.1177/2053951715604333.
Bail CA. Lost in a random forest: Using Big Data to study rare events. Big Data and Society. 2015 Dec 27;2(2).
Bail, C. A. “Lost in a random forest: Using Big Data to study rare events.” Big Data and Society, vol. 2, no. 2, Dec. 2015. Scopus, doi:10.1177/2053951715604333.
Bail CA. Lost in a random forest: Using Big Data to study rare events. Big Data and Society. 2015 Dec 27;2(2).
Journal cover image

Published In

Big Data and Society

DOI

EISSN

2053-9517

Publication Date

December 27, 2015

Volume

2

Issue

2

Related Subject Headings

  • 4701 Communication and media studies
  • 4406 Human geography
  • 2001 Communication and Media Studies