Create as many files as the number of items from two lists with the same number of items in Python

Question

Consider the file testbam.txt :

/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg001G.GRCh38DH.target.bam
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg002G.GRCh38DH.target.bam
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg014G.GRCh38DH.target.bam

and the file testbai.txt :

/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg001G.GRCh38DH.target.bai
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg002G.GRCh38DH.target.bai
/groups/cgsd/alexandre/gatk-workflows/src/exomesinglesample_out/bam/pfg014G.GRCh38DH.target.bai

They always have in common the same length and I created a function to find it:

def file_len(fname):
    with open(fname) as f:
        for i,l in enumerate(f):
            pass
        return i+1

n = file_len('/groups/cgsd/alexandre/python_code/src/testbai.txt')
print(n)
3

Then I created two lists by opening the files and doing some manipulation:

content = []
with open('/groups/cgsd/alexandre/python_code/src/testbam.txt') as bams:
    for line in bams:
        content.append(line.strip().split())

print(content)

content2 = []
with open('/groups/cgsd/alexandre/python_code/src/testbai.txt') as bais:
    for line in bais:
        content2.append(line.strip().split())

print(content2)

Now I have a json type file called mutec.json that I would like to replace certain parts with the items of the lists:

{
    "Mutect2.gatk_docker": "broadinstitute/gatk:4.1.4.1",
    "Mutect2.intervals": "/groups/cgsd/alexandre/gatk-workflows/src/interval_list/Basic_Core_xGen_MSI_TERT_HPV_EBV_hg38.interval_list",
    "Mutect2.scatter_count": 30,
    "Mutect2.m2_extra_args": "--downsampling-stride 20 --max-reads-per-alignment-start 6 --max-suspicious-reads-per-alignment-start 6",
    "Mutect2.filter_funcotations": true,
    "Mutect2.funco_reference_version": "hg38",
    "Mutect2.run_funcotator": true,
    "Mutect2.make_bamout": true,
    "Mutect2.funco_data_sources_tar_gz": "/groups/cgsd/alexandre/gatk-workflows/mutect2/inputs/funcotator_dataSources.v1.6.20190124s.tar.gz",
    "Mutect2.funco_transcript_selection_list": "/groups/cgsd/alexandre/gatk-workflows/mutect2/inputs/transcriptList.exact_uniprot_matches.AKT1_CRLF2_FGFR1.txt",
  
    "Mutect2.ref_fasta": "/groups/cgsd/alexandre/gatk-workflows/src/ref_Homo38_HPV/Homo_sapiens_assembly38_chrHPV.fasta",
    "Mutect2.ref_fai": "/groups/cgsd/alexandre/gatk-workflows/src/ref_Homo38_HPV/Homo_sapiens_assembly38_chrHPV.fasta.fai",
    "Mutect2.ref_dict": "/groups/cgsd/alexandre/gatk-workflows/src/ref_Homo38_HPV/Homo_sapiens_assembly38_chrHPV.dict",
    
    "Mutect2.tumor_reads": "<<<N_item_of_list_content>>>",
    "Mutect2.tumor_reads_index": "<<<N_item_of_list_content2>>>",
  }

Please note that this section:

   "Mutect2.tumor_reads": "<<<N_item_of_list_content>>>",
   "Mutect2.tumor_reads_index": "<<<N_item_of_list_content2>>>",

<<<N_item_of_list_content>>> and <<<N_item_of_list_content2>>> should be replaced by their respective items of the list and I would like to finally write the result of every modification into a new file.

The final result would be 3 files: mutect1.json with first item from testbam.txt and first item from testbai.txt , mutect2.json with second item from testbam.txt and second item from testbai.txt and third file with the same reasoning applied.

Please note that the notation I wrote <<<N_item_of_list_content>>> and <<<N_item_of_list_content2>>> isn't necesserarily hard-coded into the file, I wrote myself just to make clear what I would like to replace.

Answer 1

First, and even if it is unrelated to the question, some of your code is not really Pythonic:

def file_len(fname):
    with open(fname) as f:
        for i,l in enumerate(f):
            pass
        return i+1

You use a for loop over enumerate when you should simply do:

def file_len(fname):
    with open(fname) as f:
        return len(f)

because f is an iterator over the lines of the file

Now to your question. You want to replace some elements in a file with data found in two other files.

In your initial question, the strings were enclosed in triple angle brackets.

I would have used:

import re

rx = re.compile(r'<<<.*?>>>')        # how to identify what is to replace

with open('.../testbam.txt') as bams, open('.../testbai.txt') as bais, \
     open('.../mutect.json') as src:
    for i, reps in enumerate(zip(bams, bais), 1): # gets a pair of replacement strings at each step
        src.seek(0)                  # rewind src file
        with open(f'mutect{i}', 'w') as fdout:  # open the output files
            rep_index = 0            # will first use rep string from first file
            for line in src:
                if rx.search(line):  # if the string to replace there?
                    line = rx.sub(reps[rep_index], line)
                    rep_index = 1 - rep_index    # next time will use the other string
                fdout.write(line)

In comments, you proposed to change the first line of each file with the others. The code could become:

with open('.../testbam.txt') as bams, open('.../testbai.txt') as bais, \
     open('.../mutect.json') as src:
    it = iter(zip(bams, bais))
    to_find = next(it)          # we will have to find that
    for i, reps in enumerate(it, 2): # gets a pair of replacement strings at each step
        src.seek(0)                  # rewind src file
        with open(f'mutect{i}', 'w') as fdout:  # open the output files
            for line in src:
                line = line.replace(to_find[0], reps[0])    # just try to replace
                line = line.replace(to_find[1], reps[1])
                fdout.write(line)

Create as many files as the number of items from two lists with the same number of items in Python

Question

1 answers

solution1
1 ACCPTED 2021-03-08 11:32:30

Create as many files as the number of items from two lists with the same number of items in Python

Question

1 answers

solution1 1 ACCPTED 2021-03-08 11:32:30

solution1
1 ACCPTED 2021-03-08 11:32:30