GigaScience

journal article

Open Access Collection

Correction to: Open-source benchmarking of IBD segment detection methods for biobank-scale cohorts

2022 GigaScience

doi: 10.1093/gigascience/giac129pmid: 36579713

journal article

Open Access Collection

LMAS: evaluating metagenomic short de novo assembly methods through defined communities

Mendes, Catarina Inês; Vila-Cerqueira, Pedro; Motro, Yair; Moran-Gilad, Jacob; Carriço, João André; Ramirez, Mário

2022 GigaScience

doi: 10.1093/gigascience/giac122pmid: 36576131

BackgroundThe de novo assembly of raw sequence data is key in metagenomic analysis. It allows recovering draft genomes from a pool of mixed raw reads, yielding longer sequences that offer contextual information and provide a more complete picture of the microbial community.FindingsTo better compare de novo assemblers for metagenomic analysis, LMAS (Last Metagenomic Assembler Standing) was developed as a flexible platform allowing users to evaluate assembler performance given known standard communities. Overall, in our test datasets, k-mer De Bruijn graph assemblers outperformed the alternative approaches but came with a greater computational cost. Furthermore, assemblers branded as metagenomic specific did not consistently outperform other genomic assemblers in metagenomic samples. Some assemblers still in use, such as ABySS, MetaHipmer2, minia, and VelvetOptimiser, perform relatively poorly and should be used with caution when assembling complex samples. Meaningful strain resolution at the single-nucleotide polymorphism level was not achieved, even by the best assemblers tested.ConclusionsThe choice of a de novo assembler depends on the computational resources available, the replicon of interest, and the major goals of the analysis. No single assembler appeared an ideal choice for short-read metagenomic prokaryote replicon assembly, each showing specific strengths. The choice of metagenomic assembler should be guided by user requirements and characteristics of the sample of interest, and LMAS provides an interactive evaluation platform for this purpose. LMAS is open source, and the workflow and its documentation are available at https://github.com/B-UMMI/LMAS and https://lmas.readthedocs.io/, respectively.

journal article

Open Access Collection

Near-chromosomal de novo assembly of Bengal tiger genome reveals genetic hallmarks of apex predation

Shukla, Harsh; Suryamohan, Kushal; Khan, Anubhab; Mohan, Krishna; Perumal, Rajadurai C; Mathew, Oommen K; Menon, Ramesh; Dixon, Mandumpala Davis; Muraleedharan, Megha; Kuriakose, Boney; Michael, Saju; Krishnankutty, Sajesh P; Zachariah, Arun; Seshagiri, Somasekar; Ramakrishnan, Uma

2022 GigaScience

doi: 10.1093/gigascience/giac112

journal article

Open Access Collection

Accurate and fast clade assignment via deep learning and frequency chaos game representation

Avila Cartes, Jorge; Anand, Santosh; Ciccolella, Simone; Bonizzoni, Paola; Della Vedova, Gianluca

2022 GigaScience

doi: 10.1093/gigascience/giac119pmid: 36576129

BackgroundSince the beginning of the coronavirus disease 2019 pandemic, there has been an explosion of sequencing of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus, making it the most widely sequenced virus in the history. Several databases and tools have been created to keep track of genome sequences and variants of the virus; most notably, the GISAID platform hosts millions of complete genome sequences, and it is continuously expanding every day. A challenging task is the development of fast and accurate tools that are able to distinguish between the different SARS-CoV-2 variants and assign them to a clade.ResultsIn this article, we leverage the frequency chaos game representation (FCGR) and convolutional neural networks (CNNs) to develop an original method that learns how to classify genome sequences that we implement into CouGaR-g, a tool for the clade assignment problem on SARS-CoV-2 sequences. On a testing subset of the GISAID, CouGaR-g achieved an $96.29\%$ overall accuracy, while a similar tool, Covidex, obtained a $77,12\%$ overall accuracy. As far as we know, our method is the first using deep learning and FCGR for intraspecies classification. Furthermore, by using some feature importance methods, CouGaR-g allows to identify k-mers that match SARS-CoV-2 marker variants.ConclusionsBy combining FCGR and CNNs, we develop a method that achieves a better accuracy than Covidex (which is based on random forest) for clade assignment of SARS-CoV-2 genome sequences, also thanks to our training on a much larger dataset, with comparable running times. Our method implemented in CouGaR-g is able to detect k-mers that capture relevant biological information that distinguishes the clades, known as marker variants.AvailabilityThe trained models can be tested online providing a FASTA file (with 1 or multiple sequences) at https://huggingface.co/spaces/BIASLab/sars-cov-2-classification-fcgr. CouGaR-g is also available at https://github.com/AlgoLab/CouGaR-g under the GPL.

Showing 1 to 4 of 4 Articles

Articles per page

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Related Journals: