简体   繁体   中英

python; merge dictionaries with each dictionary in a new column of the output csv file

With the following script, I parse 3 files to one dictionary in python. The dictionaries do not have all similar keys and I want the values of each dictionary in a new column in my output csv file. So the keys must be all in one column, followed by columns each containing the values of the different dictionaries. The problem with my script is that is only appending values if they exist, and the result is that the values of the different dictionaries are places in the wrong columns of the output csv file. My script is as follows:

  def get_file_values(find_files, output_name):
        for root, dirs, files in os.walk(os.getcwd()):
            if all(x in files for x in find_files):
                outputs = []
                for f in find_files:
                    d = {}
                    with open(os.path.join(root, f), 'r') as f1:
                        for line in f1:
                            ta = line.split()
                            d[ta[1]] = int(ta[0])
                    outputs.append(d)

                d3 = defaultdict(list)
                for k, v in chain(*(d.items() for d in outputs)):
                    d3[k].append(v)

                with open(os.path.join(root, output_name), 'w+', newline='') as fnew:
                    writer = csv.writer(fnew)
                    writer.writerow(["genome", "contig", "genes", "SCM", "plasmidgenes"])
                    for k, v in d3.items():
                        fnew.write(os.path.basename(root) + ',')
                        writer.writerow([k] + v)
                        print(d3)

    get_file_values(['genes.faa.genespercontig.csv', 'hmmer.analyze.txt.results.txt', 'genes.fna.blast_dbplasmid.out'], 'output_contigs_SCMgenes.csv')

My output now is:

genome contig  genes   SCM     plasmidgenes
Linda     9     359     295    42
Linda     42    1       2      
Linda     73    29      5   
Linda     43    17      6   
Linda     74    4       
Linda     48    11      
Linda     66    27      

And I want to have it like;

genome contig  genes   SCM     plasmidgenes
Linda     9     359     295    42
Linda     42    1       2      0
Linda     73    0       29     5    
Linda     43    17      0      6    
Linda     74    0       0      4        
Linda     48    0       11     0    
Linda     66    27      0      0

Easiest fix: Check if the value exists, if it does append it, else append a 0 to your data array.

Probably a more complicated fix: Use a different data structure such as Pandas or a two-dimensional array that resembles your data.

Example with two dimensional array:

You would first loop through the files and fill the d3 array with d3[lineNumber][key] . eg d3[0]['genome'] would be your first rows first column.

Then you should be able to output the file with the following block:

with open(os.path.join(root, output_name), 'w+', newline='') as fnew:
    writer = csv.writer(fnew)

    # write header row
    header = ""
    for k, v in d3[0].items():
        header += k + ','
    writer.writerow(header)

    # write data rows
    for key, row in d3.items():
        line = ""
        line += os.path.basename(root)
        for k, v in row.items():
            line += ',' + v
        writer.writerow(line)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM