With the following script, I parse 3 files to one dictionary in python. The dictionaries do not have all similar keys and I want the values of each dictionary in a new column in my output csv file. So the keys must be all in one column, followed by columns each containing the values of the different dictionaries. The problem with my script is that is only appending values if they exist, and the result is that the values of the different dictionaries are places in the wrong columns of the output csv file. My script is as follows:
def get_file_values(find_files, output_name):
for root, dirs, files in os.walk(os.getcwd()):
if all(x in files for x in find_files):
outputs = []
for f in find_files:
d = {}
with open(os.path.join(root, f), 'r') as f1:
for line in f1:
ta = line.split()
d[ta[1]] = int(ta[0])
outputs.append(d)
d3 = defaultdict(list)
for k, v in chain(*(d.items() for d in outputs)):
d3[k].append(v)
with open(os.path.join(root, output_name), 'w+', newline='') as fnew:
writer = csv.writer(fnew)
writer.writerow(["genome", "contig", "genes", "SCM", "plasmidgenes"])
for k, v in d3.items():
fnew.write(os.path.basename(root) + ',')
writer.writerow([k] + v)
print(d3)
get_file_values(['genes.faa.genespercontig.csv', 'hmmer.analyze.txt.results.txt', 'genes.fna.blast_dbplasmid.out'], 'output_contigs_SCMgenes.csv')
My output now is:
genome contig genes SCM plasmidgenes
Linda 9 359 295 42
Linda 42 1 2
Linda 73 29 5
Linda 43 17 6
Linda 74 4
Linda 48 11
Linda 66 27
And I want to have it like;
genome contig genes SCM plasmidgenes
Linda 9 359 295 42
Linda 42 1 2 0
Linda 73 0 29 5
Linda 43 17 0 6
Linda 74 0 0 4
Linda 48 0 11 0
Linda 66 27 0 0
Easiest fix: Check if the value exists, if it does append it, else append a 0 to your data array.
Probably a more complicated fix: Use a different data structure such as Pandas or a two-dimensional array that resembles your data.
Example with two dimensional array:
You would first loop through the files and fill the d3 array with d3[lineNumber][key]
. eg d3[0]['genome']
would be your first rows first column.
Then you should be able to output the file with the following block:
with open(os.path.join(root, output_name), 'w+', newline='') as fnew:
writer = csv.writer(fnew)
# write header row
header = ""
for k, v in d3[0].items():
header += k + ','
writer.writerow(header)
# write data rows
for key, row in d3.items():
line = ""
line += os.path.basename(root)
for k, v in row.items():
line += ',' + v
writer.writerow(line)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.