Scholars@Duke publication: BigNN: An open-source big data toolkit focused on biomedical sentence classification

BigNN: An open-source big data toolkit focused on biomedical sentence classification

Publication , Conference

Tafti, AP; Behravesh, E; Assefi, M; Larose, E; Badger, J; Mayer, J; Doan, A; Page, D; Peissig, P

Published in: Proceedings 2017 IEEE International Conference on Big Data Big Data 2017

July 1, 2017

Every single day, a massive amount of text data is generated by different medical data sources, such as scientific literature, medical web pages, health-related social media, clinical notes, and drug reviews. Processing this wealth of data is indeed a daunting task, and it forces us to adopt smart and scalable computational strategies, including machine intelligence, big data analytics, and distributed architecture. In this contribution, we designed and developed an open-source big data neural network toolkit, namely bigNN which tackles the problem of large-scale biomedical text classification in an efficient fashion, facilitating fast prototyping and reproducible text analytics researches. bigNN scales up a word2vec-based neural network model over Apache Spark 2.10 and Hadoop Distributed File System (HDFS) 2.7.3, allowing for more efficient big data sentence classification. The toolkit supports big data computing, and simplifies rapid application development in sentence analysis by allowing users to configure and examine different internal parameters of both Apache Spark and the neural network model. bigNN is fully documented, and it is publicly and freely available at https://github.com/bircatmcri/bigNN.

Duke Scholars

Author David Page Biostatistics & Bioinformatics, Division of Biostatistics

Published In

Proceedings 2017 IEEE International Conference on Big Data Big Data 2017

DOI

10.1109/BigData.2017.8258394

Publication Date

July 1, 2017

Volume

2018-January

Start / End Page

3888 / 3896

Citation

APA

Chicago

ICMJE

MLA

NLM

Tafti, A. P., Behravesh, E., Assefi, M., Larose, E., Badger, J., Mayer, J., … Peissig, P. (2017). BigNN: An open-source big data toolkit focused on biomedical sentence classification. In Proceedings 2017 IEEE International Conference on Big Data Big Data 2017 (Vol. 2018-January, pp. 3888–3896). https://doi.org/10.1109/BigData.2017.8258394

Tafti, A. P., E. Behravesh, M. Assefi, E. Larose, J. Badger, J. Mayer, A. Doan, D. Page, and P. Peissig. “BigNN: An open-source big data toolkit focused on biomedical sentence classification.” In Proceedings 2017 IEEE International Conference on Big Data Big Data 2017, 2018-January:3888–96, 2017. https://doi.org/10.1109/BigData.2017.8258394.

Tafti AP, Behravesh E, Assefi M, Larose E, Badger J, Mayer J, et al. BigNN: An open-source big data toolkit focused on biomedical sentence classification. In: Proceedings 2017 IEEE International Conference on Big Data Big Data 2017. 2017. p. 3888–96.

Tafti, A. P., et al. “BigNN: An open-source big data toolkit focused on biomedical sentence classification.” Proceedings 2017 IEEE International Conference on Big Data Big Data 2017, vol. 2018-January, 2017, pp. 3888–96. Scopus, doi:10.1109/BigData.2017.8258394.

Tafti AP, Behravesh E, Assefi M, Larose E, Badger J, Mayer J, Doan A, Page D, Peissig P. BigNN: An open-source big data toolkit focused on biomedical sentence classification. Proceedings 2017 IEEE International Conference on Big Data Big Data 2017. 2017. p. 3888–3896.

Published In

Proceedings 2017 IEEE International Conference on Big Data Big Data 2017

DOI

10.1109/BigData.2017.8258394

Publication Date

July 1, 2017

Volume

2018-January

Start / End Page

3888 / 3896