Consider the file testbam.txt
:
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg001G.GRCh38DH.target.bam
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg002G.GRCh38DH.target.bam
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg014G.GRCh38DH.target.bam
and the file testbai.txt
:
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg001G.GRCh38DH.target.bai
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg002G.GRCh38DH.target.bai
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg014G.GRCh38DH.target.bai
They always have in common the same length and I created a function to find it:
def file_len(fname):
with open(fname) as f:
for i,l in enumerate(f):
pass
return i+1
n = file_len('/groups/cgsd/alexandre/python_code/src/testbai.txt')
print(n)
3
Then I created two lists by opening the files and doing some manipulation:
content = []
with open('/groups/cgsd/alexandre/python_code/src/testbam.txt') as bams:
for line in bams:
content.append(line.strip().split())
print(content)
content2 = []
with open('/groups/cgsd/alexandre/python_code/src/testbai.txt') as bais:
for line in bais:
content2.append(line.strip().split())
print(content2)
Now I have a json
type file called mutec.json
that I would like to replace certain parts with the items of the lists:
{
"Mutect2.gatk_docker": "broadinstitute/gatk:4.1.4.1",
"Mutect2.intervals": "/groups/cgsd/alexandre/gatk-workflows/src/interval_list/Basic_Core_xGen_MSI_TERT_HPV_EBV_hg38.interval_list",
"Mutect2.scatter_count": 30,
"Mutect2.m2_extra_args": "--downsampling-stride 20 --max-reads-per-alignment-start 6 --max-suspicious-reads-per-alignment-start 6",
"Mutect2.filter_funcotations": true,
"Mutect2.funco_reference_version": "hg38",
"Mutect2.run_funcotator": true,
"Mutect2.make_bamout": true,
"Mutect2.funco_data_sources_tar_gz": "/groups/cgsd/alexandre/gatk-workflows/mutect2/inputs/funcotator_dataSources.v1.6.20190124s.tar.gz",
"Mutect2.funco_transcript_selection_list": "/groups/cgsd/alexandre/gatk-workflows/mutect2/inputs/transcriptList.exact_uniprot_matches.AKT1_CRLF2_FGFR1.txt",
"Mutect2.ref_fasta": "/groups/cgsd/alexandre/gatk-workflows/src/ref_Homo38_HPV/Homo_sapiens_assembly38_chrHPV.fasta",
"Mutect2.ref_fai": "/groups/cgsd/alexandre/gatk-workflows/src/ref_Homo38_HPV/Homo_sapiens_assembly38_chrHPV.fasta.fai",
"Mutect2.ref_dict": "/groups/cgsd/alexandre/gatk-workflows/src/ref_Homo38_HPV/Homo_sapiens_assembly38_chrHPV.dict",
"Mutect2.tumor_reads": "<<<N_item_of_list_content>>>",
"Mutect2.tumor_reads_index": "<<<N_item_of_list_content2>>>",
}
Please note that this section:
"Mutect2.tumor_reads": "<<<N_item_of_list_content>>>",
"Mutect2.tumor_reads_index": "<<<N_item_of_list_content2>>>",
<<<N_item_of_list_content>>>
and <<<N_item_of_list_content2>>>
should be replaced by their respective items of the list and I would like to finally write the result of every modification into a new file.
The final result would be 3 files: mutect1.json
with first item from testbam.txt
and first item from testbai.txt
, mutect2.json
with second item from testbam.txt
and second item from testbai.txt
and third file with the same reasoning applied.
Please note that the notation I wrote <<<N_item_of_list_content>>>
and <<<N_item_of_list_content2>>>
isn't necesserarily hard-coded into the file, I wrote myself just to make clear what I would like to replace.
First, and even if it is unrelated to the question, some of your code is not really Pythonic:
def file_len(fname):
with open(fname) as f:
for i,l in enumerate(f):
pass
return i+1
You use a for loop over enumerate
when you should simply do:
def file_len(fname):
with open(fname) as f:
return len(f)
because f is an iterator over the lines of the file
Now to your question. You want to replace some elements in a file with data found in two other files.
In your initial question, the strings were enclosed in triple angle brackets.
I would have used:
import re
rx = re.compile(r'<<<.*?>>>') # how to identify what is to replace
with open('.../testbam.txt') as bams, open('.../testbai.txt') as bais, \
open('.../mutect.json') as src:
for i, reps in enumerate(zip(bams, bais), 1): # gets a pair of replacement strings at each step
src.seek(0) # rewind src file
with open(f'mutect{i}', 'w') as fdout: # open the output files
rep_index = 0 # will first use rep string from first file
for line in src:
if rx.search(line): # if the string to replace there?
line = rx.sub(reps[rep_index], line)
rep_index = 1 - rep_index # next time will use the other string
fdout.write(line)
In comments, you proposed to change the first line of each file with the others. The code could become:
with open('.../testbam.txt') as bams, open('.../testbai.txt') as bais, \
open('.../mutect.json') as src:
it = iter(zip(bams, bais))
to_find = next(it) # we will have to find that
for i, reps in enumerate(it, 2): # gets a pair of replacement strings at each step
src.seek(0) # rewind src file
with open(f'mutect{i}', 'w') as fdout: # open the output files
for line in src:
line = line.replace(to_find[0], reps[0]) # just try to replace
line = line.replace(to_find[1], reps[1])
fdout.write(line)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.