1 minute read

In my previous post, I described how to specifically search for public NGS-based data and fetch metadata from INSDC database. Today I hope to talk with you about how to download the selected raw data.

To download the data, several method or tools as listed below can be used. In this options, I choose the nf-core/fetchngs pipeline due to several reasons.

Why I use nf-core/fetchngs pipeline?

  • Portability & Reproducibility:
    nf-core pipelines, like fetchngs, enable consistent and reproducible analyses across various environments, streamlined by containerization (ex. Singularity).

  • Scalability:
    Fetchngs efficiently manages simultaneous downloads of multiple files, enhancing throughput and expediting data acquisition in large-scale studies.

  • Re-entrancy:
    Fetchngs supports restarting downloads from failure points, conserving time and computational resources with -resume option.

  • Connectivity:
    Designed for direct interoperability, fetchngs seamlessly connects with nf-core’s data processing pipelines (rnaseq, atacseq, viralrecon & taxprofiler) for integrated analyses.

How to use?

Required input file

To run the pipeline, an input file containing accession numbers per line is required.

vim ids.csv
ERX2744752
ERX2744753
SRX15288999
SRX15289000

How to set-up and run?

# Download Singularity images of fetchngs pipeline
nf-core download fetchngs \
    -r 1.2.0 -d -x none -s singularity -u amend

# Confirm the usage
nextflow run nf-core/fetchngs --help

2024-04-10-04-26-45-image.png

# Run the pipeline
nextflow run nf-core/fetchngs \
    -profile singularity \
    --input ids.csv \
    --outdir 00_raw \
    -resume # if error is occured.

Results

Link: fetchngs: Results

Reference

2109_Nmeth) Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers
2002_Nbiotech) The nf-core framework for community-curated bioinformatics pipelines

Leave a comment