[英]python - Scan through a large file and write lines when conditions met
I was trying to scan with positions from one file through positions in second file to find, if the features are overlaping between them. 我试图通过一个文件中的位置扫描到第二个文件中的位置来查找,如果这些特征在它们之间重叠。
I have a file Pt looking like this: 我有一个文件Pt看起来像这样:
chr10 0 60985
chr10 60988 60990
chr1 165014865 165014867
chr1 1161693 1161695
chr1 158851689 158851689
chr10 64766 64767
chr10 63600 64703
chr11 647696 647697
And file a (it has of course many many lines like the one below): 并提交文件(它当然有很多行,如下所示):
chr1 1161693 chr1uGROUPERuDELu0u832 TGCTCTTTCCAGAAACCCTCAACCCTGTACGGTCAGGAGGAAACATGGCACCTCCCCTCTGGGG T 63 NormalSupport;MinSampleCount;LowSomaticScore CLUSTER_NUM=5454;CONTIG=GGTGCAGGGAAGCAGGAAGGAAGTGAAGCTCAAAAGCCCCTAGGACAGGGCACCTCCCCTCTGGATGCTCTTTCCAGAAACCCTCAACCTTGTACGGTCAGGAGAAAACACATCCCACAAG;CONTIG_NUM=5840;DOWNSTREAM=GCTCTTTCCAGAAACCCTCAACCCTGTACGGTCAGGAGAAAACACATCCCACAAG;END=1161756;NS=1;READSOURCES=(0:3:0,1:2:13);SOMATICSCORE=19;SVLEN=-63;SVTYPE=DEL;UPSTREAM=GGTGCAGGGAAGCGGGAAGGAAGTGAAGCTCAAAAGCCCCTAGGACAGGGCACCTCCCCTCTGGAT;ensembl_gene_id=ENSG00000078808 GT:GQ 1/.:.
chr1 158851689 chr1uGROUPERuDELu3u4452 GGGGAGTAATTCTTATTCATGATATGAAAACTCTAATGTGTTTCTTATTCCAGAAAA G 100 NormalSupport CLUSTER_NUM=25182;CONTIG=CATATTTTGCTATATCTCACATCATTGTTCATCTGATAATATATGAAAACTACAATGTGTTTCTTATTCCAGAAAGGGGAGTAATTCTTATTCATGAATAAACACTGAAGGAGAAAGATTATGGATCATAGTGGGAAAAGCCACAATACCATCTACATTC;CONTIG_NUM=24300;DOWNSTREAM=GGGAGTAATTCTTATTCATGAATAAACACTGACGGAGAAAGATTATGGATCATAGTGGGAAAAGCCACAATACCATCTACATTC;END=158851745;NS=1;READSOURCES=(0:11:0,1:3:18);SOMATICSCORE=55;SVLEN=-56;SVTYPE=DEL;UPSTREAM=CATATTTTGCTATATCTCACATCATTGTTCATCTGATAATATATGAAAACTCCAATGTGTTTCTTATTCCAGAAAG;ensembl_gene_id=ENSG00000229849 GT:GQ 1/.:.
chr1 165014865 chr1uGROUPERuDELu3u7344 ACTGGCATTAGCTATGCTTCCTTAGGCAGACAGCATGTTGAGAAATTCACATTCATCAG A 100 NormalSupport CLUSTER_NUM=40249;CONTIG=CTCCAGTAAAGAGCATCTTTTAATGAAGTGTATCTGCCTGGGCTAGAAAGGCAGCTGCCTCCACTAAAGCAGGGCTGGTCCAGAAATATTACCACTTGCCTAATCCTTATAGTAATCCTAACTGGCAGGTATTATTATATCCCAATTCACACACTTAGAGG;CONTIG_NUM=38845;DOWNSTREAM=CTTGCCTAATCCTTATAGTAATCCTAACTGGCAGGTATTATTATATCCCAATTCACACACTTAGAGG;END=165014923;NS=1;READSOURCES=(0:32:0,1:9:18);SOMATICSCORE=60;SVLEN=-58;SVTYPE=DEL;UPSTREAM=CTCCAGTAAAGAGCATCTTTTAATGAAGTGTATCTGCCTGGGCTAGAAAGGCAGCTGCCTCCACTAAAGCAGGGCTGGTCCAGAAATATTACCA GT:GQ 1/.:.
chr1 176569763 chr1uGROUPERuDELu3u12313 GATCGCGCCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAA G 100 NormalSupport;LowSomaticScore CLUSTER_NUM=65333;CONTIG=GCGTGGTAGCGGGCGCCTGTAGTCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCACAGAGCTCAAGCTCACAATTCCATTATACTGTTACTC;CONTIG_NUM=62936;DOWNSTREAM=ATCACAGAGCTCAAGCTCACAATTCCATTATACTGTTACTC;END=176569833;NS=1;READSOURCES=(0:14:0,1:8:7);SOMATICSCORE=22;SVLEN=-70;SVTYPE=DEL;UPSTREAM=GCGTGGTAGCGGGCGCCTGTAGTCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAG;ensembl_gene_id=ENSG00000116183 GT:GQ 1/.:.
chr1 184683773 chr1uGROUPERuDELu3u15990 TAACAGTTTGGATGAAAAAAATGTAAGGTATGCTCATCTAAACTATAGATCATTGAAAACTGGTAGTTTAGCTAATGAGATTCAACCTCTAGACCAAAATCTAGAAACAAAACAAAAAAAGAAATTTTGCTGAGTTAAATATAAAAGTTCTAAGTTTACACTAAAAAAAAGAA T 93 PASS CLUSTER_NUM=82731;CONTIG=TAAACCACCACATGCAAAGAGCCTGTAACTGAAAGCTCTTGAGTGCAGTGCCACAAGGCACTGGTTGGGGTCCAACCAAAACTTCTTCCTAACTTGGCTGCTCAAAGGCAGGGTGGAGAACACTCATTTGTCAGCAGACCATAC;CONTIG_NUM=79822;DOWNSTREAM=AACTTGGCTGCTCAAAGGCAGGGTGGAGAACACTCATTTGTCAGCAGACCATAC;END=184683945;NS=1;READSOURCES=(0:12:46,1:0:44);SOMATICSCORE=30;SVLEN=-172;SVTYPE=DEL;UPSTREAM=TAAACCACCACATGCAAAGAGCCTGTAACTGAAAGCTCTTGAGTGCAGTGCCACAAGGCACTGGTTGGGGTCCAACCAAAACTTCTTCCT;ensembl_gene_id=ENSG00000116406 GT:GQ 1/.:.
chr1 193557238 chr1uGROUPERuDELu3u20250 TGTGTGTATACACACACACACATATATGTGTGTATACACACACACACATATATATATGTGTGTATACACACACACACATATATATGTGTGTATACACACAC T 100 NormalSupport CLUSTER_NUM=103112;CONTIG=TTTAAAATAAGGGGGGAAATTTATATATATATATATATATATATGTGTGTGTGTATACACACACACATATATATATATACACACACACATATATATATATATACACATACACACACACACACACACACACACACACACACACTGTTTGAAATA;CONTIG_NUM=99338;DOWNSTREAM=ACATATATATATATGTGTGTATACACACACACATATATATATATACACACACACATATATATATATACACATACACACACACACACACACACACACACACACACACACTGTTTGAAATA;END=193557338;NS=1;READSOURCES=(0:23:0,1:3:26);SOMATICSCORE=60;SVLEN=-100;SVTYPE=DEL;UPSTREAM=TTTAAAATAAGGGGGGAAATTTATATATATATAT GT:GQ 1/.:.
chr1 211021468 chr1uGROUPERuDELu4u5565 ACAAGCTGTTGGGTTATCTCTTTATGATCTTCAACTACACTAAGAAGTGTGTCAATTGTATTCAGAATTCCCATAGCAGTAACTGCTTTGTCATCACTACCTTCTTCATCTGGCCCTGTCTGGATTACTTGGCTAAATGTCATTGCCAACTGTTGTGTCATTTCTACTGCAATAGGAGTAACTTCTTCACTATTTTCACAGATCATTTTCTGAATTACATTGGTAAGGTCATCATTTTCTGTTTCTGTTATAATATGAAGAAGAGCCTGCATTACAGGTCTGATAAATGCTGTGATACATTCTTTAGATTTTTCTTGATTGCTGATAAATACTTGAAGGACAATGGCAGCTTCCACTTTCACAGGCATGTCTCTGTCATCAATCAGACATCTTCTTGTTAGCTCTAAAGCTGTTTGAATGTTCTGATCACTTTTGAACTTTACTTCACAAAAATAGTGAAGTACTCAGCAAGCCCTTGCTCTCATGTAGCCTAGTTCACTGCTGAAGAGAGAGAACACATGATTCTGCAACATGTATTCCATCTGATCATTACAGATCTTTTTCTTCAGAAGTGTTTCAGCTAAAGAGCCAATCATGCAGGGCTCCATCTTTTTTTCAAAGGTCAGCATTGGGTTCTGTAAGAATCTGGTAACAAAATCCCATAGTCTTTTGAAGTACCTCCTTCCTCTTACTATGGGCTGTAAACAAAAGCTTCTGGGCAGCAGTGGCAAAGGAAATGAAATCTTCAGACACATCAAACTTCATGCGTATATACTCGTAAGGGTCTTCTTCCCAAAGTTCCTCATCAGCATCTGTATAACACATCAATGGAAAAATAACATCTTGGATAATGCCTTGTATATGGGGCTTCAGATTCTTCCAGGTGAGAGCATGAGAAACTCCTTGATTAATAGAATTTAATGTCTGTTGTAAAACTTGAGGAGCCATATATTGCTTCTCGTTGTACTGTTATAACACTTTCAATAAAACTTGCTGGACACCAACAACAAATTCCTTCAGAAATACTTGAGCAAATTCATTATACTCCTAGGAAACACTGCCAGAGCTTCCATATCTTTCAAAAAGTCTTGCTAAAATATGTAAGGCCCACTTCTTGCATTTCCATTATGATAACTCAGGTCGGTCATCTTCTTCAATTCGAAGTGTTTCAGTCTTTAAAATTTCTACCCATTCTGTCAGGTTCTGTTGGTTTATCAGTTCCAGTGGTATAGAGTATAGAGTATATACTGAACAAGAGCATAGAGTATATACTGAACAAGATCATAGAAGATCTTGAATATTTGTTTCTGGATGACGACAGACTGATCAGACTGGTCAGAAAGAAGCTGGATAAAATGATCCTTTAGAACTGACAGAAAATGCTGCATTGCTGCTACCAATGGACTCCACTCCTCTAGTTTTTTATACTCATAAGTTTTCACAAGCTGATAAAGGCAAATAATTCCTATCCAACAAGCACTGTTATCACACTGAAGATAAAAGCCAGTTTTGTCCACAATGGCAGTCCAGCAGCTTGGATAATCACGTTTGGTGATGTGATGAATGCATGTAGTAAGCTGTACCCTGATGAGCTCAGGAGGATGGATAATGGCTTCTACAATATTTTCTCAAATACAATGGCAATCTTCTTCTGGAATAGTATAAGGGGATATATACTTTTGTGCTGTTTCTTGACCAGGCCAATACTGTGTTATATTTTTCAAATAGATAACACCTGCCTGTCTCACAGGTAAATCCAGCTGTTCCGACATAGTAATCTGGAGCAGCGTTGAGACAAAATTCAGAGATTTGTGTGCTTCATTGAGCTGGCGCTCCATGGCCTCTTGCAGGGCTGGGTCCATGGTGCCCCGCAGGGCCTCGATAATGGTGTTGGGGTCCATTGCAGCATGAACTAGGTCAAACCCAGGGCTTGAGTGCTACTGGGCCAGGAATAGCACTACTCACTGCACACATGGACCTGCCGCAGCGGCAACTGGCGCAAAAGGGCAATGGTGCAATCTTAACTCACTGTAACCTTGAACTCCTGGGCTCAAGTGATCCTCCCACCTCAGCCTCCCAAGTAGCTGGGACTGCAGGCTCACGCTACCATGCCAAGCTGATTTTGTGTTGTTGTAGAGATAGGGTCTCACTATGTTGCCCAGGCTGGTCTTGAACTCCTGGTCTTAAGCAATCATCCTGCATCAGCCTCCCAAAGTGCTGGATTTACAAGCCTGAGTCACCATGCCTGGCCAATATTTTCAATAGTTAGAGGCAGGATTGAAAAACAATTCCTTTTTGCTTTGCTCAAAATAAGTATTTATGAGCATCCACTTACGAGTTACTGTGCTAGATGCTGGACATACAAATAGAAATAAGACCCAGTTACTGCTGTTGTGGAAAGGGCAACATTAGAGAAATGTTCAGGAAATGGAGGAAAGGCCCTTATCTCAGCTTAAGGAAGCCTTAACTCACTATTGTTTGGCTGAATCTCAAAAATGTACAAACCAATAGGAGTGTCCCCTTCTTCCCTACAGATTCCCTGAAGCCAGTGGGCTGTCTGGCAGGAAAACCAAATACTAACTGTGATTTGCCCATTCTAGAAGGTAAGAGAAGGGATTCAGGGCATGCGTGTAAAGTTAGGCTTTGATGACTTGTGTTAGAAGGTTCAGGAAGAAAGCCGCATCACTTATCCCCTATGGAAAAAAAGGAATGGCCAAGAGAACTTCCTTGAATCCATGAAGAGCTTCCAAAAAGAGAAATTTTAAGTTTAGGGATGATAAGGAGCAGAAAGGCTTGGTCTGCTTTACCTGGTGAGCCTATCAATGCACCCACCAAGCACATGCTTGTTACCCAGCAGAGTGTTGGGCACTAGGGGGTGGAGGAGGATAGAATCTAAGATTACTTTTAGCTCTGAAAATCTCAAGACCATCTAAGTTAGGCTCTTCATTTTACAAAAGACAAAGTGCAGACCCAGAAAAGGCCTTATCCAAAATCACATTACTAGCTCTTGAGTACAAGATTACTAGCAGGCTGCAATCTGGGAAGATGGCTGAAGTGGACTTGACATCATATTAAACTCCAGCATCAGTACTTTGGGCAACATGTAGTCACCAGAGGTCTCTGAGCTGGTGACCAGCTTAGTTAAAACCACTTTCCCCCTTGATAATAGTAAATGCCATTTCCAGTTAAGTTACAGATGACAGATTTTATGGAATGTTTCCACCTTAATGTGCGAGATCTGAATGGTACCTCCATCACTGACATTACATTTTGTTTCTCAACCCTCCTCTTCACAGCTCTTATTAGGAAATCGGGGAAAGTCAGGTGCTGAGGCCCAAAGGAGCTGTGCCTTGCTGGTGTTCCCTCATGAAAGGCTGCAGCCAGAACTGTGTCCTTCCTTCTCAATGCAGGTCTCTATGCTAAACTTGTTCACCCTCTGTTCCAGAGCTTTAGGTGCTCCACACGAAGTACTCTTGAACTCTGTCAACCCTGACCTCTCCCCTGTCATGTAGAAAGGCCTCAAGTGGTAGTTTTTGGAGCTCCCGATCATACAAGCACATGCACCCTTCTCAGGAGAGGGCAATTAGGAAACCTGCTGCTAACTAGAGGTGCCATGGCAGGTGCCAACTGGATCAGTGCAGGATGGAACAGCACATTCCAGACAGCCTCATGAGTTCATTGCTAAGGGTAGAGCTAATTTACAGGAAACATGCAGAGGGTTTGACTGGATCCCCTCTGATACCAGCTGAGCCCAGCTCTCACATGCCTATGGCATGGAGTGGATGGTGCTGGGGCAGGCTTCTCTCAGCCTGACAGCAGAGTAGCTGTCTCCACTATTGAGCCAGGTGTGACCCCAGAATGCCATCTCTACCCCCTCAGCATGGTAGCAATGCCACGGCAGTGAGGATGTGGGGAATGAGGAGCAGCTTAGGAGAGACTAAGGCATTGCAGGACTGAGGTCCACTATGCAGCTCCAGGTCCCCCTACTATGCTCCTTCAAGAACAGTGTTGGTAGTAAGAGATTATCACCAAGTTCCTCTCAACTCAGCAAAACAAGTGGGGAAGCGAGAAAACAAAGGAGGAGGAAGATAGCTTCCTGTGCTGTGAGGACTAATTGCAAGCAAAATATGTGTCAGCTGCCACTGCTCACAGTAAACACTCAACGGATGGTAACCCTAAGGGTTTGGGCTGCAGCTTTAGTGCCCAAATCCTACCTCTATGGAGGGAGTGAGACACTTGGACAACAACAATCAAGGCAGGTGGAGAAAATTCTGGTTTTGCTTGTCCCAAAACAAATCACCTCACCCATAGCAGTGTAAGACAACTAGCATTTTACTATGTTCACAGATTCTGAGCACGAGAAATTTAGAAAAGGCCCAGCAGGATGGCTTGTCTCTCTTTCTTATACCTGGAGCCTCAGCCGGAAAGATTCAAAGACTGGGATGACTTAGTTGGAATCATCTATGAGCCAGTTCATTCACATAACTGGTGGTTGACCCATGCTGTCAGCTAGAACCTGAAGCTGTAAGCTGGAACACCTATACATGGCCCTTCCATGTGGCTCTTTGAATTTCTTTACAGAATGGTAGCTGGGTTCTAAGAGCAGGTGTCCCAAACAAATCACACAGAATCTGTATGACCTTTGATGACTTAGCCTATCACTATAATGTGACTTCTGCCATAGCCACAAGGCTTTCCAGGTTCAAGGGTAGAGAACAAGTATCAAAGGCAAATTTTTAGAAGAACATGTGGAATGGAAGGTAATGTTAAAGCTATCTCTGGAATATACAATCTGCTTCATTTGTCAAGCTAAACATATGCCCCCATAGTCTCCCAATGCAGAGATATTATGTACAATGTCTGGAACAAATATTTTCATATGACAGCTCATTAATTCAGTCCTTACAATAATACTGCAAGGTAATTTATTATTACTCCACGTTTTACCTGAAACTGAGCTCAAAAAGTTTAGATAACTTGACCAAGGTCATATAGTTTTTAAGCACCAAAGCAGTATCAAATCCAGATCTAAATCTGTTAACTACTTGTTAAAACTCAAAGCTCATAGTCTTTAGAGAACATTTTGGGTCAAACTAAATGATCTTGGTTCTAGGGCAGTAGGAATAAGGTAGCAAAGAAAAAAAAAAGGCAGCTGTTCATTTGATGCCTAAATGTTCACAGTGTACACACAATGCTGAAGCTACGCTTTGTAACTCTTAAGTGTTATTTCTTTTGCTAATAAATGCATTTTATGTAAGGAAAGACTTTGAAAAACAGTATCTCCAGATAGAATGGAAGCTGGGCCACCTTGCTAGGCTCTCCTACACCCCAATCGCATTTTCCAAAGTAGGGAGAAAGGTACAGTTCAGAACCTCTGTTTTTCAAACTGGAATAAGGCCTCTCTCTTACTTTTCCCCCAACTCTTTAAGACAGACAGTTTTTATTTTGAAACTTGGCCTAGATTGTGGGGCAAAGCACAACCAAATGAAATGGGTGAATTCCCATTTTGATCCTTATTGTCATTCTCCATCTCAGGTAGCTGATGGATCCAGGCTAAATGTGGGTCAAAAAGTATGTTATAGATCAGCTTTAGTTGAAAAAAAATGTGTATACACCAATACAACTCAATGGGGGAATAAAAACTTTTTCAACAAATAGTGTTAAGACAACTGGATATCTGCATGCAAGTGAATGAAGTTGGACCCCTACCTCACACCATACACAAAATTTAACTCAAAATGGATGATAGTCTTAATTGTAAGAACTAATGCTCTAAAACTCTTAGATGAAAATATAGGGGCAAATCTTTATGACCCTGAGTTAGGCAAAGCCTTTTTAATTGTGACACCAAAAGTACAGTCTACAGAAGAAAACTAGAAAAAATGGACTATATGAAAACCTAAACTTTTGTACTGCAAAAGATACCATCAAGCTAGTAAAAAATGCAACCCACAGACTCGCAGAAAATATTTGCAAATAAAATACCTTATAAGAGATTTGTATCCAGAATATATTACAAAAAACTCATAACTCAATAGTGAAAAAATAACCCAATTAAAAATAGGCAAAGAATCTAAATAGACATTTCTCCAAAGAAGATATAGAAATGGCCAATAAGCACATGAAAAGATGTTCAACATCCCTAGTCATACGGGAAATGCAAATCAAAATCACAATGAGACATCACTTTACACCCATTAAAATGGCCATAATCAAAACGACAAGTGTTGGTGATGATGTAGAGACATTAGAACACTGAAATTGTTGGAAGAATGTAAAGTGGTGCAGATGCTTTGGAAAACAGTTTAGCAGTTCCTCAAGATGTTAAATGTGGAGTTACCAGATGACTCAGCAATTCCACTCCTAAGTCTATACCCAAAAAAGGCAGAAACAAATGTCCATATGAAAACTTGTACATGAATGCTCATAATGTTCATAGGAACATTGTTCATAATGGCCAAAAATGTGAAAACAGCCCATATACCCATCAACTGATGAACAGATGAATTAGATGTGGTATAGCCATACAATAAATTATTATTCAGTAATAAAAAGGAATGAAATACTGATAAATGCTACAAAATATTAGAACCTTGAAAATGTTATTCCAAGTGAAATAAGCCAGTCACAAAAGACCACATATTGTACAATTCAATTTATATGAAGTGTCCAGAACAGGTAAATCTACATTTAGAGAAAGTAGATTAGTGGCTGCCTAGGGCTAGGAAATGTGAGGAGAAATGGAAAGTGACTGAAAATGGGTATGGGGTTTCTTTTGGGAGTTATGAAAATGTTCTGAAATTGATAGTGGTGATGGTTGCACAACTCAATAATATACTTAAAACTATTGAATTGTACATTTTAATTATGCAAATTGTATGGTAGGTGAGTTTTATTTCAATATTTATATCCACACACACCTACACATAGAATTATATGTATATGTTTACACACACACACACACACACACACACACACACATATATATATATATATGTGTCCAAAGCAAATTCATATTAAAGTGGGGGCAATGTCAAGTATAGAAAACCACCCACAGATACTCCTCTGCTTTAGCCTAACATGTGGCTGTGTGACCAAAGGTACTGTGAAAAGCAATTAGATGGTGTTTTCTGAAAAAAAATTTTATTGAGGTATAATTTACATGTGACAAAGTACTCCCATTTCAGTTCATAGTTTGATGTGTTTTCACAAATGTGACCACCATATAACCAACACATTCAAGATACAGAATATTTTTATTACCCCAAAAAGTCCCTTGTGCCCCTCTGTCTCAAAATGCCCAACCCTAGGCAACAATTGATCTACTTTATGTCATTTTAAGTAAGTTTTGCCCTTTTCTATAATTCCAGATAAATTAAATCCTACAGCATGTACTATTTGGGACCTGGACTTTTTTATAACTCAGGATAATATATTTGAGATTCAACCATGTGTCATGCATATCTCTAATACATTGCTTTTTTATTGCTGAGAGTATTCCCTTATATAAATAGACTAAAGTTTGTTTTGCACTCATCTGCTTAAGGACATTTGATATTTTTCCCTCATTTTAGTTATTATGAACACTGATATACAAGTTTTTGTGTGAACATATGTTTTCACTGCTCTTTGGTAAATGAACAGAAGTGGAAACACTAGGTCTTATAAGTGTATGCCTAACTTTATAAGAAACTGCCAACCTGATTTCCAAAATGTTAATACTATTCTACATACCCACAAGCAATATATGTAAGTCTTTTTAATTGTATTCATATAGTGGGTATCTCATGTAGTTTTCAATTGCCTTTCCTTGATGACTAATTAAATGGAACATCTTTTCATGTGCCTTTGGCCATTCATGTGTGCGTGTAAAGTGTCTGTCCAAATCTTTTGCCTATTTAAAAAATTTATTTTTAATTGAGTTGTATAAGTTTTTTTTAATACATTCTGGATTCAAACCTTTATCAGCTTAGTCTGTGGCTAAAAATTTCATTTTCTTAATGGTGTCTTTCTTAATAGTGCCTTTGATAAAGTCTAATGTATAATTTTTTCTTTTATGTTTCAAATGTTTTGTGTCGTAAGAAATCTTCGCATATGCCAATGTTGAAAAACTTCTAGTTTTACAGTTTTAATTTTTATGTTTCAGTATACTACCTATTTCAAGTTAATTTTTGAGTCTACAGTAAAGGGTTGAAGTTCATTTTTTGTATATGAATATCCAATTTTGTAGAACCATTTGTTGAAAGCCTATTTCCCCACTAAAATATCTTCTTAACAACATTTAATCACAAGATTCTTCATTTCCCTTTTATAGTCACACCCACTTCCTTCCAACTTTCCTTTAATCCTCTTCTCATTAACTCCTACCAACCACTAATCTGTTTTCCCTTTCTATAATTTGGTCATTTCAAGAATATTGACTATATGGAATCATAAAGTATATAACCTTTTGGGATTTGCTGTTTATTCTTTGCATTTATCCATGTATGATGTTTTTCCTTCATTTGTATAGATCTAAATTTCATTGATAGTATTTGCCTTGCATCTGAAGAACTTGCATTTTAAGTCAGCTGGTGAATAATTCTGTCAGCTTTTGTTTGCCTGAAAAAGTTTTTATTTCATATTTATTTTTGAATGGTATTTTTATTGGATATAGAATTCTAGGTTGACAGTTTGCTGTTTTTGTTACAGCTCTTTAAAGACGTCATGTCATTTTCTTCTGATTTAAAAGTTTCTGACAAGACATATGTGAGTATTGTTATCTTTGTTCCTCTGTATGTAATTTTTTTGTCAGCTCTTAAAATTTTCTCTTTATCAATTTTGTTCAGTAATTTGATCATGAACTCCTTTGGTGTGATTTTATTTTGTGTTTATCCTTCTTGGAGTTGTTGACCTGCTTGGATCTATGAGTTTATAATTTTCATAAAATATGAAAAGTTTTTAGCCACTACTTGTGATATTTTTTCTGCCCTTCATCCTTTCTGGGACTCCAGTGATATGCATATTTACAATAGCTGCTTAAAAGACCTTGTTTGTTAATTCTATTTTCTCTGTCATTTTAGGTCTATTTCTGTTAACTGATTTTTCTCCTAGTTATAAGTCGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGCTTTTAGGCATGCCTGGTAATTTTGGATTGGATGCTGGACATTGCGCCTTCATATTGTTGAATGCTGGATATTGTCCTCTTTAATGGATGTTGAACTGTGCTCTTACTGGCAGTTAACTTAATTACATATCAGCTTGATCCTTTTTTAGGCTTTATTAAAACTTTATTAAGTAGACTTTACTCTAGGGCTTATTTAGTCCTATAACTAAGCTGTGACTTATTTTGGAATGCCCTCAATATTAAGCAAGGACCCTAAATTTACCCTGGCTGGTCATAACTCAAACTTCTACCAGCCCTGTATTGCCTCTGGAAATATTCAACTTACAGATACCTGGTAGTTCTTTGCCTGGCTTGTGCAATTTCAGCCTACATATAGCAGTCAGCTGTAGTTACAAGAGGAATGCTATAAAGATCCTCAGCATTTTCTCTACTGAGCACATTCCAGCCACTCAGCCTCTTCAAATTCAAATTTTTGTCTTCTCAACTCTGTGATATCATGCTCTTCTTGAGTGTGTATTGTGGTCCAGAAAGTATCTCCAGGCAGAAAGTTGGTATTATCATAGGTCTTATCTCATTTGTTTCCTTTCTCTGAGGAGTCACAGAACTGTACTAATTACCAAAAGGAGTTTTTTTCATAGATTTTTGTCTGGTTTTCTAGTTGTTTACAGTAAGAGAATAAGTCCAACTCCATTCACTCTCTTGTGTCTGGAAGTAGAAATCACTGATATTATCTATGTTAATACAGTGAAAAAACATCCAGTGTTGTCCCTCTTTCCATCCTCACCCCCACACCTCCACCACTAATACTGTCTCAGTGATGTGCAATATCCAAGGCTTGATTTAGGTCTAATGAATATCTCTCAACACAGAACACTTTGGGTTTCCCAGCACAAGAATTGAAAAAAGCACATGGTACTCTCCCAGCTGAGATTCTTAGCCTCTTTTTTCCCCTACTGTGAGCAGTGTCACCCAGGTGCCCTCACTCATCTATTTTAAATACACTCACCATCTCTGCTAACAGCTTCAGCCTCTGTTAATCAGGTAGCTTTTCATTAATGTTCTTATCCTTGTCCCAGCATCTTATTTGTCATGCTTTATTAGTAGACATGATAAGTCATAAATCTTAGTTCCTCTTAGCTTCTTAGTATCTGTTGTCAAGTTGAAGATCCCAGACAGAGGGAGATGAGAACTGGTTCATTACAGAGTGACTCTTTCTAATATCTCCTTGGTTCTTTATGATACTAATACATTCTTCCCACTGCAGTTGCCTTTTTCTCTATTTAGCTCCAATACAATTCATAGAAATGAGATCATTCGGCTTAAGGGTACATTATCTTCACTTCAACCCATCCTTCTTTAAGATGAACAATGCTGCCTTTAGATTCTGATACTGGCTCTGCCACTTATTAGCTGGATAGCCCCAGACAATTTACTGACTCTGAGTTTGTTTCCATACTTATGGAATAGGGAAAATATATTCTGCATATCTCATGGAGTGGCAACAAAAGCCAAAATTGACAAATGGGATCTAATTAAACTAAAGAGCTTCTGCACAGCAAAAGAAACTACCATCAGAGTGAACAGGCAACCTACAGAATGGGAGAAAATCTTCACAACCTATTCATCTGACAAAGGGCTAATATCCAGAATCTACAATGAATGCAAACAAATTTACAAGACAAAAACAAACAACCCCATCAAAAAGTGGGCAAAGGACATGAACAGACACTTCTCTAAAGAAGACATTTATGCAGACAAAAAACACATGAAAAAATGCTCACCATCACTGGCCAACAGAGAAATGCAAATCAAAACCACAATGAGATACCATCTCACACCAGTTAGAATGGCAATCATTAAAAAGTCAGGAAACAACAACTGCTGGAGAGGATGTGGAGAAATAGGAACACTTTTACACTGTTGGTGGGACTGTAAACCAGTTCAACCATTGTGGAAGTCAGTGTGGCAATTCCTCAGGGATCTAGAACTAGAAATACCATTTGACCCAGCCACCCCATTACTGGGTATATACACAAAGGACTATAAATCATGCTGCTATAAAGACACATGCACACGTATGTTTATTGCGGCATTATTCACAATAGCAAAGACTTGGAACCAATCCAAATGTCCAACAAAGATAGACTGGATTAAGAAAATGTGGCACATATACACCATGGAATACTATGCAGCCATAAAAAAGGATGAGTTCATGTCCTTTGTAGGGACATGGATGAAACTGGAAATCATCATTCTCAGTAAACTATCGCAAGAACAAAAAACCAAACACCGCATATTCTCACTCATAGGTGGGAATTGAACAATGAGAACACATGGACACAGGAAGGGGAACATCACACTCTGGGGACTGTTGTGGGGTGGGGGGAGGGGGGAGGGATAGCATTGGGAGATATACCTAATGCTAGAGGACGAGTTAGTGGGTGCAGCGCACCAGCATGTCACATGTATACATATGTAACTAACCTGCACATTGTGCAGATGTACCCCAAAACTTAAAGTATAATAATAATAAATTAAAAAAAACAAAATATATACATAATATGATCTCGGCTATGGAAAAGAAAAACATTCAGTGGAAAAAAGCTTAAAGGGAAGAGCACCAAAAAAAAAAAAAAAAAAAGATCAAGCAAGATAATCGATGTTAAGTACTTTATATAGTGCCTGTACCATGGTAAATGCTTAATAATTGTTAGCTATGATGACAATAATGATGATTAAAATGGTCTTTCCATACACTCTGCATACCATCCCTCTGACTGACCTGATGATTATAATTATTCCCTAACTACTAGCAAAGAAGCTTCAATCTCCCTTCACTTCTGCTTTTAAACAGTTTTCTCCTATTTTACAAAAAAGAGTGTGTCCTTTTCTTCCCTTTTCCAGCTCTTCATAAACACAGTGTATTAGTCCATTCCTTCATTGCTATAAATAAATACCTGAGACTGGGTAATTTATAAAGAAAAGAGGTTTAATTGTCTCATGGTTCTGCAGGCTGTACAGAAAGCACGATGCTGACATCTGCTCAGCTTCTGGGGAGTCCTCAGGAAACTTACAATCATGGCAAAAGGTAAAGGGGGAGCAAGGTGTCTTAC A 100 PASS CLUSTER_NUM=25425;CONTIG=TGGAAGGCAACTGTCGTTTAAGATGAAAAACAGTGATAACTGCTGAGAAATCTCAGCCTTAAGTGTGTTGAAAGTTGATAATGTCCATCTATGTGGTGAACAATTTGTGAAACAAGACTGTCAAAGAGAGTTATAGGTTCTTGGAAGTAAGAGGCAACATCTTATGATCAAGCTGTTGGGTTATCTCTTTATGATCTTCAACTACACTAAGAAGTGTGTCAATTGTATTCAGAATTCCCATAGCAGTAACTGCTTTGTCATCACTACCTTCTTCATCTGGCCCTGTCTGGATTACTTGGCTAAATGTCATTGCCAACTGTTGTGTCATTTCTACTGCAATAGGAGTAACTTCTTCACTATTTTCACAGATCATTTTCTGAATTACATTGGTAAGGTCATCATTTTCTGTTTCTGTTATAATATGAAGAAGAGCCTGCATTACAGGTCTGATAAATGCTGTGATACATTCTTTAGATTTTTCTTGATTGCTGATAAATACTTGAAGGACAATGGCAGCTTCCACTTTCACAGGCATGTCTCTGTCATCAATCAGACA;CONTIG_NUM=24722;DOWNSTREAM=TCAAGCTGTTGGGTTATCTCTTTATGATCTTCAACTACACTAAGAAGTGTGTCAATTGTATTCAGAATTCCCATAGCAGTAACTGCTTTGTCATCACTACCTTCTTCATCTGGCCCTGTCTGGATTACTTGGCTAAATGTCATTGCCAACTGTTGTGTCATTTCTACTGCAATAGGAGTAACTTCTTCACTATTTTCACAGATCATTTTCTGAATTACATTGGTAAGGTCATCATTTTCTGTTTCTGTTATAATATGAAGAAGAGCCTGCATTACAGGTCTGATAAATGCTGTGATACATTCTTTAGATTTTTCTTGATTGCTGATAAATACTTGAAGGACAATGGCAGCTTCCACTTTCACAGGCATGTCTCTGTCATCAATCAGACA;END=211033725;NS=1;READSOURCES=(0:33:68,1:0:69);SOMATICSCORE=60;SVLEN=-12257;SVTYPE=DEL;UPSTREAM=TGGAAGGCAACTGTCGTTTAAGATGAAAAACAGTGATAACTGCTGAGAAATCTCAGCCTTAAGTGTGTTGAAAGTTGATAATGTCCATCTATGTGGTGAACAATTTGTGAAACAAGACTGTCAAAGAGAGTTATAGGTTCTTGGAAGTAAGAGGCAACATCTTATGA;ensembl_gene_id=ENSG00000143473 GT:GQ 1/.:.
chr1 249175897 chr1uGROUPERuDELu4u25993 CCATACTGAACTATTAAAGTTATTTGAAATGACAATTGTAATAATATCTTCCTTGAGGAGTTCTACAATCTTTGCTGTTATTTCTTTAAGTCCTTCCTTTAATGAGTACTGTTTGGTGCATGTAACCTGCTGTGGTGTAGACAGTGTTATGGACTTCATTTTAATTTGAACTAGGTTAGAAAATTTTAGTTCCTCTAGTTTCCTTTAATATAAGTTAAAAAGATTTGGAATAAAATTCATTCCTGTAATGTCTTATAATTTGGGTGAGCAGTAAAAAGTGCATAGAGCAGTATAGAAGCAGAGG C 100 PASS CLUSTER_NUM=140439;CONTIG=TACTGAACTATTTAGATATCCCTGTGGTAATGTTTTGAATTGGTATTGTTTACCTTCCCATGTAAAGTTGGTATATTCCTGGCTTGCTTTATTTATTGGTTTGGCAAAGAACACATCTGACATGTCTATGATTGTATAGTATTTGTCATTATGATTAATGATCTTATTAACTAGCTCTTCTACATCTGGTAATGCTCTTGGCATTTTAGTTGAGACTTTATTTAAATTTCTATAGTCAATAGTTAATCTATAGCTCCTGTTTGTTTTTAAGACAGGCCAAGAAAGCTTCAAGTTTATTTACCTCTGAGTCCTTCTGTATGAATGTTTATTGTGGGATCTGGCCAGCAGCCCGCAATGCAATGGGGCTCTCTCTTTGTTCCCAGGCAGATCGGCAGGTTGAGAAATAATAGACACACACAAGATAGTGAAAGCTGGGTCCAGGGGGGTCACCGCCTTCTGGTCCCACGGAGCCAAAAATGCACGGGATATACCAGCATTTATTATTAAGTTTAGTGAGGGCAGGGGTAGGTTAGTGAGGGATTTAGGGTCATTTGATTATGAGGTTAGATGGTCACATGGGGATGAAGTAATTCTTTAACATAACATCTGTATGCAGAAGTACAGTAT;CONTIG_NUM=136911;DOWNSTREAM=CAAGAAAGCTTCAAGTTTATTTACCTCTGAGTCCTTCTGTATGAATGTTTATTGTGGGATCTGGCCAGCAGCCCGCAATGCAATGGGGCTCTCTCTTTGTTCCCAGGCAGATCGGCAGGTTGAGAAATAATAGACACACACAAGATAGTGAAAGCTGGGTCCAGGGGGGTCACCGCCTTCTGGTCCCACGGAGCCAAAAATGCACGGGATATACCAGCATTTATTATTAAGTTTAGTGAGGGCAGGGGTAGGTTAGTGAGGGATTTAGGGTCATTTGATTATGAGGTTAGATGGTCACATGGGGATGAAGTAATTCTTTAACATAACATCTGTATGCAGAAGTACAGTAT;END=249176200;NS=1;READSOURCES=(0:44:63,1:0:64);SOMATICSCORE=60;SVLEN=-303;SVTYPE=DEL;UPSTREAM=TACTGAACTATTTAGATATCCCTGTGGTAATGTTTTGAATTGGTATTGTTTACCTTCCCATGTAAAGTTGGTATATTCCTGGCTTGCTTTATTTATTGGTTTGGCAAAGAACACATCTGACATGTCTATGATTGTATAGTATTTGTCATTATGATTAATGATCTTATTAACTAGCTCTTCTACATCTGGTAATGCTCTTGGCATTTTAGTTGAGACTTTATTTAAATTTCTATAGTCAATAGTTAATCTATAGCTCCTGTTTGTTTTTAAGACAGGC GT:GQ 1/.:.
chr10 20219603 chr10uGROUPERuDELu0u11231 AAAAAAAGGCTGGCACGGTGGCTCACACCTGTAAATCCCAGCACTTTGGGAGGCCGAGGTGGGTGGGTCACCTGAGGTTGG A 47 PASS CLUSTER_NUM=60987;CONTIG=GTATACTGATTTTGGAAAATATGTCAGCTCAATTTGGAAGATTGCTAAACCACCTAAAACAGAGCCTGTTTAAAAAATAAATAAATAAAAAATAATAGTTCAAGGCCAGCCTGACCAACAAGGTGAAATCCCATCTCTACTAAAACTACAAAAATTAGC;CONTIG_NUM=59978;DOWNSTREAM=GAGTTCAAGGCCAGCCTGACCAACAAGGTGAAATCCCATCTCTACTAAAACTACAAAAATTAGC;END=20219683;NS=1;READSOURCES=(0:7:18,1:0:60);SOMATICSCORE=40;SVLEN=-80;SVTYPE=DEL;UPSTREAM=GTATACTGATTTTGGAAAATATGTCAGCTCAATTTGGAAGATTGCTAAACCACCTAAAACAGAGCCTGTTTAAAAAATAAATAAATAAAAAATAA;ensembl_gene_id=ENSG00000120594 GT:GQ 1/.:.
chr11 56932527 chr11uGROUPERuDELu1u8703 CCAGAGCACATCATGAGATCCTGGAGCCAGACCTAGAAACCTATTAAACAAGGGAACCCCAGCATGTCTCATTTATTACCCAAAGGAAGGAAATTAGCATCACATGTATAAAGCACTCAGTAGTCTATAAAATGCTCTTAGCAATTCACTTCGTGAGGAAGTGCCTTTTCCCACTTCCACAGAGGGATACCGAGCCTCAAGGGATTAGGAGACTAATCCAGGCTCACACAGCTGATAAGGAACAGCCCAGACATTTTGGCCCAGTGCTGCTAGCCCTCAATCTGGTGCTTTGCCCTCTGCACCGCCTGCCATGCAGGGAATACATGTTAATATCTCTCTGTATTAGTCTGTTCTCATGCTGCTAATAAAGACATACCCAAGACTGGTTAATTTATAAAGGAAAGAGGTTTAACTGAATCACAGTTGCACATGGCTGGCATCATGGTGGAATGCAAAGGAGGAGCAAGGCCACATCTCACATGGTGGCAGGCAAGAGGGCATGTGCAGGGGAACTCCCCTTTATAAAACCATCAGCTCTCTGCTGGGTGTGGTGGCTCACACCTGTAATCCCAGCACTTTGGGAGGCCGAGACGGGTGGATCACCTGAAGTCAGGAGTTCGAGAACAGTCTGGCCAACATGGCGAAACCCCATCTCTATTAAAAATACAAAAAAAAATTAGTTGGGCGTGGCAGTGAGTGCCTGTAATCCCAGCTACTCGAGAGGCTGAGGCAGGAGAATCACCTGAGCCCAGGAGGCAGAGGTTGCAGTGAGCCAAGATCACGCCACTGCACTCCAGCCTGGGTGATAAGAGTGAAACTCGGTCTCAAAAAAAAAAAAAAAAAAAAAATCACATCTCACGAGACTTATTCACTATCATGAGAACAGCATGGGAAAACCCTGCCCCCATGATTCAATTACCTCCCACTGGGTCCCTCCTATGACATGTGGGGATTATTACAATTCATGGTGAGATTTGGGTGGGGACACAAAGCCAAACCATATCACTCTCTTTCTTGGCCACAGGTGGACTTGAAAAACCCTTTCCCTTAACCAAGCAGGAGCCCCAGTAGCTGCTTTGTTCAACGTCTGTTTCTCTAAGGTCTCCTACTCTGGAATATTTAGGAAAACCCAAGGTGGCTCAAAAAGATCATCCCTGTACCAAGCCTCAGGGATTCTAGTGTGACCCAACCTCTCCCAGTCCCTATAGGCATCTCTTTCAAACATCAGAACTGGTGCAGACTCAATAGAAGAAAGGTGTGACTCCAAGATGTCCTCCCTCACAGCCCTTTATCACAATGGGGCTCTCTTCACCGTAGAAGGGAAGGCTGAGGCTCAGAAATATGGTGTAGGCTTGGACTTTGGAAGGAGAATGACTAAGCTGAAGTTCTTCCTCTGCCATTTACAAGCTGTGTGACTTTGGAAAACTTTCTTAATGTTTTTGCTTTTAATTTCCTCATGTATAAAAGATGGGGTTGTCATACTTAGGTTTAAGACTATTGCACGTGCCTGTAATTCTAGCACTTTGGGAGGCCGAGGTGGGCAGATCACTTGAGGTCAGGAGTTTGAGACCAGCTGGCCAACATGGTGAAACACCATCGCTACTAAAAATATAAAAATTAGCCAGGCATGGTGATGCATGCCTGTAATCCCAGCTACTCGGGAGGCTGAGGCAGGAGATTCGCTTGAACCTGGGACGTGGAGGTTGCGTTGAGGCAAGATCACGCCACTGCACTCCAGCCTGGGCAACAGAATGAGACTCCACCTCAAAAAAATAGAAGACTATTGCAAAGATTTTAAAAGATGGCACAGGTACAACACTGAACACAGTGTTTCATGAGTGACAGTCATAAGCTTTTCTACCACCTAACTGTGTTACTTTTAGCAAACTACCTAATTTATCTCGGCCTCGGTTGTTTTGTGATTCTTTTCAAGGGAAAAAAAAAGCACCTCCCTTCATAAGCCTGCTATAAGAACAATATAAAATAGTTGACATAAAGCACTTAATCTCTTGCCTGTCACAGGAAAGGTGCTCAACACCTGTTAGCTCTGTGTTGATGTCCCCAGATCAGAGAGCCAGAAAAGGAGGGCTAAAGTTCCAACCAGGATCTTCTGGTTCCAACCCAGCAGCACCTGAGCAAGGTTGATAATGGTAAAGAAAAGAAGAAATAAGCAATAGGGGCCTCACAAACACAGGCAGGAATAAATCACTCCCATGTGGCAGTTTTCTGTGCCTGACCCAAGAGATAGGTGTTGCACCTTTATCCATGTGGACAGATAAGAAAGAGCAAAGGGCATAGGTCACAGAGGTGAGTATGACCTCATTGCAGTGCCAGGATATTTCCAGGCAATTGGCAAGACTTATTGACAATCATCCAATAAATAGGATAAATAGTTCTTACTTCCCAAGTGCCTGGGAAGAGAGCTTGCAGCTCTTTCCAATTCCTGCCTCTCATCTCTTCCTCCTCGGACACTGCAGCTCTCCGTTTCTGGAAAATCTCAGGGCAGAATCTGGGCTCCTTCCCACTTTCCTCTGTCCAGAATTCTAAGGATGCCTGGGAGAAGGCATCTGCCCTGAATGAATGGGTCAGAGGCATGATTCCAAGTGACTGGCTGGTAATTGGAAGAGCTCAGTGGTCTGGCTGGGGAGATGAGAGCCTGCTAAGCACATCTGGCCAGCATCAGAATCACAAATTGAAGAGAGCTTGGAAGGACTCAGCCCCCACTGACATTCACAAAGGAAGGAAAATGTAGGGAGCCAGACAGAAAGGAGTGATGCTCTTCAAAGTCAACAGGCTTATAACAAACACCATCTCATAGGTGAAAGTGGGAGCACAGAACAGGCACACTCAGCTAACACACAGCTTTCTGAGGGAGGTCATCCTCCACCAAATGAAAATAGCCCTGCTTTTTCATTTTTTAATTTTTATTAATTTTTAAATCAACAAATAAAAATTATATATATTGGCCAGGTACAGTGGTTCATGCCTATAATCCCAGCACTTTGGGACGCCAAGGCAGGAGAACCACTTGAGACCAGGAGTTCAAGACCAGCCTGGACAACATAGTGAGATCCCATCACTACAAAAGAAAAAATATTTAAGGAAAAAAATTGTATATATTTATGGCATACGATGTGAAGTTTTGACATATCTACACATTGTGAAATGATTAAATCAAGCTAATTAACATATCCATCATGCCACATCCTTAAATTTTTATGGTGAGAACATTTAAGATCTATCTCAGCAATTTTGAAGTGCATGCTATTGTCACCATGCTACACAATAGACATCCAGAATTTATTCATCTTGTCTAGCTGAAACATTGTATCCTTTGACCAACATCTCCATACCTCTCCTGCATACCTCCCAGCCCCTGGTAACCATTATTCTCCTCTGCTGCTATGAGTTCGATTTGTTCGGATTCCACATATAAATAAGATCGTGCAATATATTTCTGTTTATGCCTGGCTTATTTCACTTAGCAAAATGTCCTCCAGTTTCATCCATGTTGTCACAAATGACAAGATCTCCTTCTTTTTTAAGACTGAATAGTATTCTATTGTGTACATGTACCACATTTTCTTTATCTGCTGTATTAGTCTGTTTTCACACTGCTGATAAAGACATATCCAAAACTGGGAACAAAAAGAGGTTTAATTGGACTTACAGTTCCACATGGCTGGAGACGCCTCAGAATCATCACAGGAGGTGAAAGGCAGTTCTTACATAGCAGCAGCAAGAAAAAATGAGGAGGAAGCAAAAGCAGAAACCCCTGATAAACCCATCAGATCTCATGAGACTTATTCACTATCACGAGAATAGCACGGGAAAGACCGGCCCCCATGATTCAATTACACACTCCCCGCCCCTGCTGGGTCCCTTCCACAACACGTGGGAATTCTGGACAATATAATTCAAGTTGAGATTTTGGTGGGGACACAGCCAAACCGTATCAT C 100 PASS CLUSTER_NUM=52260;CONTIG=CTTCTACCCCTGGCAGTGCAAAGTCCAGGACCAGGCAGGTGGGGGGTGCTGGAAAAGTTAGCAATTGAGTGATTGTACAGCCAATTTGTCACTTTCATGGGATCGGAGTGAGGCTATCTCAGAATCTTCTGTATCTACTTCATCTCTTGCTCTTTCCATTCTTTGATACTTTGACACATCCACATCCACTGCTCCTGGCCCCTCCGAATCTCATGTCCTCACATTTCAAAATCAATCATGCCTTCCCAACAGTCCCCCAAAGTCTTAACTTATTTCAGCATTAATCCAAAAGTCCACAGTCCAAAGTCTTATCTGAGACAAGGCAAGTCCCTTTCACCTATGAGCCTGCAAAATCAAAAGCAAGCTAGTTACTTCCTAGATACAATGGGGATACAGGTACTAAGTAAATACTGCTGATCCAAATGGGAGAAATTGGCCAAAACAAAGGGGCTACAGGGCCCATGCAAGTCT;CONTIG_NUM=52670;DOWNSTREAM=TCTGCTCCTGGCCCCTCCGAATCTCATGTCCTCACATTTCAAAATCAATCATGCCTTCCCAACAGTCCCCCAAAGTCTTAACTTATTTCAGCATTAATCCAAAAGTCCACAGTCCAAAGTCTTATCTGAGACAAGGCAAGTCCCTTTCACCTATGAGCCTGCAAAATCAAAAGCAAGCTAGTTACTTCCTAGATACAATGGGGATACAGGTACTAAGTAAATACTGCTGATCCAAATGGGAGAAATTGGCCAAAACAAAGGGGCTACAGGGCCCATGCAAGTCT;END=56936485;NS=1;READSOURCES=(0:21:23,1:0:65);SOMATICSCORE=60;SVLEN=-3958;SVTYPE=DEL;UPSTREAM=CTTCTACCCCTGGCAGTGCAAAGTCCAGGACCAGGCAGGTGGGGGGTGCTGGAAAAGTTAGCAATTGAGTGATTGTACAGCCAATTTGTCACTTTCATGGGATCGGAGTGAGGCTATCTCAGAATCTTCTGTATCTACTTCATCTCTTGCTCTTTCCATTCTTTGATACTTTGACACATCCACATCC GT:GQ 1/.:.
I used a code: 我用了一个代码:
out = open('/home/istolarek/OUTintersectPT','w')
masterlist = [row for row in Pt]
for line in a:
g=[]
if line.startswith('chr'):
line = line.strip().split()
g.append(line[0])
## print line[0]
##print len(w)
for row in masterlist:
row = row.strip().split()
f = range(int(row[1]),int(row[2]))
w=[]
for i in g:
if i == row[0]:
w.append(int(line[1]))
for i in w:
## print line[0],row[0],line[1]
## out.write(str(line[0])+'\t'+str(row[0])+'\t'+str(line[1])+'\t'+str(f)+'\n')
if int(i) in f:
out.write(str(line)+'\n')
else:
break
else:
break
out.close()
This code seems to work, but takes far too much time. 这段代码似乎有用,但需要花费太多时间。
So if column 1 in both files matches, go to comparing second column. 因此,如果两个文件中的第1列匹配,请转到比较第二列。 Second column from file a is just a number (line[1]), in file Pt it is a range of values f = range(int(row[1]),int(row[2])).
文件a中的第二列只是一个数字(第[1]行),在文件Pt中,它是一系列值f = range(int(row [1]),int(row [2]))。 So if the first condition about matching columns 1 (these with chr values) and if values from second column from file a are in the range of f, I want to write these lines to the output.
因此,如果关于匹配列1的第一个条件(这些与chr值匹配)以及来自文件a的第二列的值是否在f的范围内,我想将这些行写入输出。
I wrote another one: 我写了另一个:
I wrote: 我写:
masterlist = [row for row in Pt]
for line in a:
line = line.strip().split()
for row in masterlist:
row = row.strip().split()
b = int(line[1])
f = range(int(row[1]),int(row[2]))
if (line[0] == row[0]):
if a in f:
print b,f
These both should be a match. 这两者都应该是匹配的。 But the script only reports the first ontry from Pt file.
但该脚本仅报告Pt文件中的第一个文件。 If the first entry is not matched, the output is none.
如果第一个条目不匹配,则输出为none。 I want the script to output all matches
我希望脚本输出所有匹配项
I'm fighting with this for quite some time. 我已经和它斗争了很长一段时间。
At least, don't do this: 至少,不要这样做:
row = row.strip().split()
f = range(int(row[1]),int(row[2]))
w=[]
print row[0]
if (line[0] == row[0]):
w.append(int(line[1]))
for i in w:
if (int(i) in f):
Instead: 代替:
f = int(row[1]), int(row[2])
...
if f[0] <= int(i) <= f[1]:
or similar. 或类似的。
Well, first of all, you are looping on a
, but you assigned a value to a
inside the loop, so it's not likely to get very far. 好吧,首先,你是循环上
a
,但你分配一个值,以a
内循环,所以它不可能走得很远。
Second of all, I believe that strip().split()
is redundant. 其次,我认为
strip().split()
是多余的。 You don't need strip()
because it's implied in split()
. 你不需要
strip()
因为它隐含在split()
。
Third of all, you should only split
each line in the master file once. 第三,你应该只
split
主文件中的每一行。 You are doing that for each line of input, which is bound to increase processing time a bit. 您正在为每行输入执行此操作,这必然会增加处理时间。
I am not entirely certain I understand your requirements from your code, but it seems to me something along these lines should help you: 我并不完全确定我从您的代码中了解您的要求,但在我看来,这些方面的内容可以帮助您:
import sys
from collections import defaultdict
master = defaultdict(list)
with open('Pt') as Pt:
for entry in Pt:
n, low, high = entry.split()
master[n].append(map(int, (low, high)))
with open('a') as a:
for line in a:
n, i = line.split()[:2]
for low, high in master[n]:
if low <= int(i) <= high:
sys.stdout.write(line)
break
To explain: First read and process all the data in the master file just once. 解释:首先只读取并处理主文件中的所有数据。 Storing the master data in a defaultdict is handy here because it allows you to scan only the rows that matched the first column.
将主数据存储在defaultdict中非常方便,因为它允许您只扫描与第一列匹配的行。
map(int, ...)
converts to ints. map(int, ...)
转换为int。
When processing the input file, we can retrieve the ranges against which to compare the second value using the first value. 处理输入文件时,我们可以检索使用第一个值比较第二个值的范围。 Since
master
is a defaultdict(list)
, if there are no matches for the first column, we'll end up iterating an empty list. 由于
master
是defaultdict(list)
,如果第一列没有匹配项,我们最终会迭代一个空列表。
Note that your original code using range()
would have been equivalent to a condition 请注意,使用
range()
原始代码将等同于条件
low <= i < high
You'll have to adjust the comparison operators as needed. 您必须根据需要调整比较运算符。
UPDATE oops. 更新哎呀。 I put the
break
outside the condition. 我把
break
放在了条件之外。 After fixing it I get the following three items: 修好后,我得到以下三项:
chr1 1161693 chr1uGROUPERuDELu0u832 TGCTCTTTCCAGAAACCCTCAACCCTGTACGGTCAGGAGGAAACATGGCACCTCCCCTCTGGGG T 63 NormalSupport;MinSampleCount;LowSomaticScore CLUSTER_NUM=5454;CONTIG=GGTGCAGGGAAGCAGGAAGGAAGTGAAGCTCAAAAGCCCCTAGGACAGGGCACCTCCCCTCTGGATGCTCTTTCCAGAAACCCTCAACCTTGTACGGTCAGGAGAAAACACATCCCACAAG;CONTIG_NUM=5840;DOWNSTREAM=GCTCTTTCCAGAAACCCTCAACCCTGTACGGTCAGGAGAAAACACATCCCACAAG;END=1161756;NS=1;READSOURCES=(0:3:0,1:2:13);SOMATICSCORE=19;SVLEN=-63;SVTYPE=DEL;UPSTREAM=GGTGCAGGGAAGCGGGAAGGAAGTGAAGCTCAAAAGCCCCTAGGACAGGGCACCTCCCCTCTGGAT;ensembl_gene_id=ENSG00000078808 GT:GQ 1/.:.
chr1 158851689 chr1uGROUPERuDELu3u4452 GGGGAGTAATTCTTATTCATGATATGAAAACTCTAATGTGTTTCTTATTCCAGAAAA G 100 NormalSupport CLUSTER_NUM=25182;CONTIG=CATATTTTGCTATATCTCACATCATTGTTCATCTGATAATATATGAAAACTACAATGTGTTTCTTATTCCAGAAAGGGGAGTAATTCTTATTCATGAATAAACACTGAAGGAGAAAGATTATGGATCATAGTGGGAAAAGCCACAATACCATCTACATTC;CONTIG_NUM=24300;DOWNSTREAM=GGGAGTAATTCTTATTCATGAATAAACACTGACGGAGAAAGATTATGGATCATAGTGGGAAAAGCCACAATACCATCTACATTC;END=158851745;NS=1;READSOURCES=(0:11:0,1:3:18);SOMATICSCORE=55;SVLEN=-56;SVTYPE=DEL;UPSTREAM=CATATTTTGCTATATCTCACATCATTGTTCATCTGATAATATATGAAAACTCCAATGTGTTTCTTATTCCAGAAAG;ensembl_gene_id=ENSG00000229849 GT:GQ 1/.:.
chr1 165014865 chr1uGROUPERuDELu3u7344 ACTGGCATTAGCTATGCTTCCTTAGGCAGACAGCATGTTGAGAAATTCACATTCATCAG A 100 NormalSupport CLUSTER_NUM=40249;CONTIG=CTCCAGTAAAGAGCATCTTTTAATGAAGTGTATCTGCCTGGGCTAGAAAGGCAGCTGCCTCCACTAAAGCAGGGCTGGTCCAGAAATATTACCACTTGCCTAATCCTTATAGTAATCCTAACTGGCAGGTATTATTATATCCCAATTCACACACTTAGAGG;CONTIG_NUM=38845;DOWNSTREAM=CTTGCCTAATCCTTATAGTAATCCTAACTGGCAGGTATTATTATATCCCAATTCACACACTTAGAGG;END=165014923;NS=1;READSOURCES=(0:32:0,1:9:18);SOMATICSCORE=60;SVLEN=-58;SVTYPE=DEL;UPSTREAM=CTCCAGTAAAGAGCATCTTTTAATGAAGTGTATCTGCCTGGGCTAGAAAGGCAGCTGCCTCCACTAAAGCAGGGCTGGTCCAGAAATATTACCA GT:GQ 1/.:.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.