Greg introduced me to a pipeline that he has been working on called ZED-align, an alignment and methylation calling pipeline for Zea Epigenomics Database. Greg gave me the 5geno sample, which contains 5 pairs of reads, each with the size about 40GB, to run on his pipeline. Since Greg estimated that it would take 8 hours to run each pair of reads and each pair of reads are distinct calculations, running all 5 pairs of reads in a sequential manner would take about 40 hours, which is way too inefficient. As a result, Greg gave me the task to parallelize the run using the module launcher. After digging around the documentations for launcher, I learned that the module parallelizes batch jobs by running each line of batch commands on different nodes. Therefore, I created a simple batch script that locate all the pairs of reads (in fq format) and echo each command on a new line. To my surprise, the pipe finished the 5 pairs of reads in about 7 hours.
|