Two separated FASTQ paired-end files from Sam/Bam for paired-end RNA-seq

BY IN Code, Notes, R, Tutorials NO COMMENTS YET , ,

Sequencing centers often return data as aligned bam files. However, some times it is necessary to convert the bam files back into fastq format in order to realign, etc. To convert from bam to fastq, I use Picard. Below is a script I wrote to convert from bam to fastq while separating the paired end reads into two separate files, while maintaining a certain naming convention.

General code

b=$bam
java -Xmx2g -jar $installdir/picard-tools/SamToFastq.jar \
INPUT=$b  \
FASTQ=${b/.bam/_R1.fastq} \
SECOND_END_FASTQ=${b/.bam/_R2.fastq}

The result will be that the $bam file, for example SAMPLE.bam, will be split into two separated FASTQ paired-end files, for example SAMPLE_R1.fastq and SAMPLE_R2.fastq.

For more options and information on Picard.SamToFastq, consult its documentation.

I keep all my bam files in one folder, /bam, so I can also use a loop to convert all the bam files in that folder fastq format, and reorganize them in a separate /fastq folder under the same directory.

As a loop

#!/bin/bash
MYPATH="/data/home/jfan/Projects/Walsh_C1/data-raw/bam/*"
for b in $MYPATH
do
    echo $b
    r1=${b/.bam/_R1.fastq}
    r2=${b/.bam/_R2.fastq}
    java -Xmx2g -jar /data/home/jfan/Lib/picard-tools-1.113/SamToFastq.jar \
    INPUT=$b  \
    FASTQ=${r1/bam/fastq} \
    SECOND_END_FASTQ=${r2/bam/fastq}
done

So, what do you think ?