python實現(xiàn)GATK多線程加速示例
更新時間:2022年07月01日 11:02:34 作者:陳光輝_花生所
這篇文章主要為大家介紹了python實現(xiàn)GATK多線程加速示例詳解,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進步,早日升職加薪
GATK 變異分析
對于大數(shù)據(jù)樣本可能會比較慢,因此可以按照染色體拆分后進行多線程并行計算。
下面是我寫的一個python多線程腳本,僅供參考,拙劣之處敬請指正。
#!/usr/bin/python3 import _thread import os import threading import time muthreads=[] bam_file="a.mkdup.bam" out_file_prefix="flower" chr_list=["CHR01","CHR02","CHR03","CHR04","CHR05","CHR06","CHR07","CHR08","CHR09","CHR10","CHR11","CHR12","CHR13"] for chr in chr_list: threads_comonder_name= "gatk HaplotypeCaller --intervals " + chr +" -R /mnt/j/BSA/02-read-align/Tifrunner2.fasta -I " + bam_file + " -ERC GVCF -O "+ out_file_prefix +"-"+chr+".erc.g.vcf" muthreads.append(threads_comonder_name) exitFlag = 0 class myThread (threading.Thread): def __init__(self, threadID, name, counter, comander): threading.Thread.__init__(self) self.threadID = threadID self.name = name self.counter = counter self.comander = comander def run(self): print ("開始線程:" + self.name) print_time(self.name, self.counter, 5, self.comander) print ("退出線程:" + self.name) def print_time(threadName, delay, counter,comander): # while counter: if exitFlag: threadName.exit() time.sleep(delay) print(comander) os.system(comander)#調(diào)用操作系統(tǒng)命令行處理數(shù)據(jù) # counter -= 1 # 創(chuàng)建新線程 threadlist=[] for i, threadsnu in enumerate(muthreads[0:11]): print(i) print(threadsnu) threadsnew=myThread(1, "Thread-" + str(i), 2, threadsnu) threadlist.append(threadsnew) # 開啟新線程 for threads in threadlist: threads.start() for threads in threadlist: threads.join() print ("運行結(jié)束退出主線程")
下面的來自網(wǎng)絡(luò)未驗證
多條染色體的同樣本的vcf文件合并
# for i in {1..22} X Y ;do echo "-I final_chr$i.vcf" '\';done # for i in {10..19} {1..9} M X Y ;do echo "-I final_chr$i.vcf" '\';done module load java/1.8.0_91 GATK=/home/jianmingzeng/biosoft/GATK/gatk-4.0.3.0/gatk $GATK GatherVcfs \ -I final_chr1.vcf \ -I final_chr2.vcf \ -I final_chr3.vcf \ -I final_chr4.vcf \ -I final_chr5.vcf \ -I final_chr6.vcf \ -I final_chr7.vcf \ -I final_chr8.vcf \ -I final_chr9.vcf \ -I final_chr10.vcf \ -I final_chr11.vcf \ -I final_chr12.vcf \ -I final_chr13.vcf \ -I final_chr14.vcf \ -I final_chr15.vcf \ -I final_chr16.vcf \ -I final_chr17.vcf \ -I final_chr18.vcf \ -I final_chr19.vcf \ -I final_chr20.vcf \ -I final_chr21.vcf \ -I final_chr22.vcf \ -I final_chrX.vcf \ -I final_chrY.vcf \ -O merge.vcf
合并的時候需要注意,vcf文件的順序跟每個vcf文件里面頭文件順序是相同的。
以上就是python實現(xiàn)GATK多線程加速示例的詳細(xì)內(nèi)容,更多關(guān)于python GATK多線程加速的資料請關(guān)注腳本之家其它相關(guān)文章!
相關(guān)文章
python實現(xiàn)將一維列表轉(zhuǎn)換為多維列表(numpy+reshape)
今天小編就為大家分享一篇python實現(xiàn)將一維列表轉(zhuǎn)換為多維列表(numpy+reshape),具有很好的參考價值,希望對大家有所幫助。一起跟隨小編過來看看吧2019-11-11