![](/img/trans.png)
[英]How to write values from one csv to another on condition using Python
[英]Write data from one csv to another python
我有三个CSV文件,其属性为Product_ID,名称,成本,说明。 每个文件都包含Product_ID。 我想将名称(file1),成本(file2),描述(File3)合并到具有Product_ID和以上所有三个属性的新CSV文件中。 我需要有效的代码,因为文件包含超过130000行。
将所有数据合并到新文件后,我必须将该数据加载到字典中。 像:Product_Id作为键,名称,成本,描述作为值。
在创建汇总结果之前,将每个输入.csv读入字典可能会更有效。
这是一种读取每个文件并将列存储在以Product_ID为键的字典中的解决方案。 我假定每个文件中都存在每个Product_ID值,并且包含标头。 我还假定除Product_ID外,文件中没有重复的列。
import csv
from collections import defaultdict
entries = defaultdict(list)
files = ['names.csv', 'costs.csv', 'descriptions.csv']
headers = ['Product_ID']
for filename in files:
with open(filename, 'rU') as f: # Open each file in files.
reader = csv.reader(f) # Create a reader to iterate csv lines
heads = next(reader) # Grab first line (headers)
pk = heads.index(headers[0]) # Get the position of 'Product_ID' in
# the list of headers
# Add the rest of the headers to the list of collected columns (skip 'Product_ID')
headers.extend([x for i,x in enumerate(heads) if i != pk])
for row in reader:
# For each line, add new values (except 'Product_ID') to the
# entries dict with the line's Product_ID value as the key
entries[row[pk]].extend([x for i,x in enumerate(row) if i != pk])
writer = csv.writer(open('result.csv', 'wb')) # Open file to write csv lines
writer.writerow(headers) # Write the headers first
for key, value in entries.items():
writer.writerow([key] + value) # Write the product IDs
# concatenated with the other values
一个通用的解决方案会针对每个遇到的id
处理3个文件时产生一条记录,可能不完整,这需要使用专门的数据结构,该数据结构只是一个列表,并预先分配了插槽数
d = {id:[name,None,None] for id, name in [line.strip().split(',') for line in open(fn1)]}
for line in open(fn2):
id, cost = line.strip().split(',')
if id in d:
d[id][1] = cost
else:
d[id] = [None, cost, None]
for line in open(fn3):
id, desc = line.strip().split(',')
if id in d:
d[id][2] = desc
else:
d[id] = [None, None, desc]
for id in d:
if all(d[id]):
print ','.join([id]+d[id])
else: # for this id you have not complete info,
# so you have to decide on your own what you want, I have to
pass
如果您确定不想进一步处理不完整的记录,可以简化上面的代码
d = {id:[name] for id, name in [line.strip().split(',') for line in open(fn1)]}
for line in open(fn2):
id, cost = line.strip().split(',')
if id in d: d[id].append(name)
for line in open(fn3):
id, desc = line.strip().split(',')
if id in d: d[id].append(desc)
for id in d:
if len(d[id])==3: print ','.join([id]+d[id])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.