Skip to main content

Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs.

Publication ,  Journal Article
Formenti, G; Abueg, L; Brajuka, A; Brajuka, N; Gallardo-Alba, C; Giani, A; Fedrigo, O; Jarvis, ED
Published in: Bioinformatics
September 2, 2022

MOTIVATION: With the current pace at which reference genomes are being produced, the availability of tools that can reliably and efficiently generate genome assembly summary statistics has become critical. Additionally, with the emergence of new algorithms and data types, tools that can improve the quality of existing assemblies through automated and manual curation are required. RESULTS: We sought to address both these needs by developing gfastats, as part of the Vertebrate Genomes Project (VGP) effort to generate high-quality reference genomes at scale. Gfastats is a standalone tool to compute assembly summary statistics and manipulate assembly sequences in FASTA, FASTQ or GFA [.gz] format. Gfastats stores assembly sequences internally in a GFA-like format. This feature allows gfastats to seamlessly convert FAST* to and from GFA [.gz] files. Gfastats can also build an assembly graph that can in turn be used to manipulate the underlying sequences following instructions provided by the user, while simultaneously generating key metrics for the new sequences. AVAILABILITY AND IMPLEMENTATION: Gfastats is implemented in C++. Precompiled releases (Linux, MacOS, Windows) and commented source code for gfastats are available under MIT licence at https://github.com/vgl-hub/gfastats. Examples of how to run gfastats are provided in the GitHub. Gfastats is also available in Bioconda, in Galaxy (https://assembly.usegalaxy.eu) and as a MultiQC module (https://github.com/ewels/MultiQC). An automated test workflow is available to ensure consistency of software updates. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Bioinformatics

DOI

EISSN

1367-4811

Publication Date

September 2, 2022

Volume

38

Issue

17

Start / End Page

4214 / 4216

Location

England

Related Subject Headings

  • Workflow
  • Software
  • Licensure
  • Genome
  • Bioinformatics
  • Algorithms
  • 49 Mathematical sciences
  • 46 Information and computing sciences
  • 31 Biological sciences
  • 08 Information and Computing Sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Formenti, G., Abueg, L., Brajuka, A., Brajuka, N., Gallardo-Alba, C., Giani, A., … Jarvis, E. D. (2022). Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs. Bioinformatics, 38(17), 4214–4216. https://doi.org/10.1093/bioinformatics/btac460
Formenti, Giulio, Linelle Abueg, Angelo Brajuka, Nadolina Brajuka, Cristóbal Gallardo-Alba, Alice Giani, Olivier Fedrigo, and Erich D. Jarvis. “Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs.Bioinformatics 38, no. 17 (September 2, 2022): 4214–16. https://doi.org/10.1093/bioinformatics/btac460.
Formenti G, Abueg L, Brajuka A, Brajuka N, Gallardo-Alba C, Giani A, et al. Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs. Bioinformatics. 2022 Sep 2;38(17):4214–6.
Formenti, Giulio, et al. “Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs.Bioinformatics, vol. 38, no. 17, Sept. 2022, pp. 4214–16. Pubmed, doi:10.1093/bioinformatics/btac460.
Formenti G, Abueg L, Brajuka A, Brajuka N, Gallardo-Alba C, Giani A, Fedrigo O, Jarvis ED. Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs. Bioinformatics. 2022 Sep 2;38(17):4214–4216.

Published In

Bioinformatics

DOI

EISSN

1367-4811

Publication Date

September 2, 2022

Volume

38

Issue

17

Start / End Page

4214 / 4216

Location

England

Related Subject Headings

  • Workflow
  • Software
  • Licensure
  • Genome
  • Bioinformatics
  • Algorithms
  • 49 Mathematical sciences
  • 46 Information and computing sciences
  • 31 Biological sciences
  • 08 Information and Computing Sciences