Parallelising freebayes with snakemake
A wrapper for freebayes,
- When a chunk is completed, that cpu core will not move onto the next region until all cores have completed their respective chunk. This is particularly problematic in regions of variable coverage, and so one can attempt to split the genome into regions of roughly equal coverage. Unfortunately, this still results in many cores being unused for substantial periods of time.
I was implementing a
To do so, we can write a snakemake rule (below) which runs an R script to read in the genome index (.fai) file, and output multiple bed files, breaking the genome into chunks of equal size. By using an extra snakemake wildcard, the index of each genome chunk, we can produce, and supply freebayes with different bed files. Finally, after concatenating the vcfs with
The benefit of this, is that snakemake will automatically run the next job when each chunk is complete, reducing overall computation time as compared with
1# Read in the desired number of genome chunks from the config.yaml, and arange a sequence 1-n.2chunks = np.arange(1, config['chunks'])34# Note - in this case the script also produces some other files for Freebayes5rule GenerateFreebayesParams:6 input:7 ref = config['ref']['genome'],8 index = config['ref']['genome'] + ".fai",9 bams = expand("resources/alignments/{sample}.bam", sample=samples)10 output:11 bamlist = "resources/bam.list",12 pops = "resources/populations.tsv",13 regions = expand("resources/regions/genome.{chrom}.region.{i}.bed", chrom=config['chroms'], i = chunks) # bed files14 log:15 "logs/GenerateFreebayesParams.log"16 params:17 metadata = config['samples'],18 chroms = config['chroms'],19 chunks = config['chunks']20 conda:21 "../envs/diffsnps.yaml"22 script:23 "../scripts/GenerateFreebayesParams.R"
Update - 09/03/2021 - In the beautiful nature of open source software development, I have now written a parallelisation section in the freebayes documentation :)
I'll leave you with a song I've been enjoying recently. Happy variant calling.
More Posts
Browse all posts