[英]How do I merge two csv files?
I have two csv files. 我有两个csv文件。 EMPLOYEES contains a dict of every employee at a company with 10 rows of information about each one.
员工包含公司中每个员工的意见,每个员工都有10行信息。 SOCIAL contains a dict of employees who filled out a survey, with 8 rows of information.
SOCIAL包含对员工的命令,这些员工填写了调查,并提供8行信息。 Every employee in survey is also on the master dict.
接受调查的每位员工也都遵循主旨。 Both dicts have a unique identifier (the EXTENSION.)
这两个字典都有一个唯一的标识符(扩展名)。
I want to say "If an employee is on the SOCIAL dict, add rows 4,5,6 to their column in the EMPLOYEES dict" In other words, if an employee filled out a survey, additional information should be appended to the master dict. 我想说的是“如果某个雇员在“社交”词典中,则在其“雇员”词典中的列中添加第4,5,6行”换句话说,如果某个雇员填写了调查表,则应在主词典中附加其他信息。
Currently, my program pulls out all information from EMPLOYEES for employees who have taken the SURVEY. 目前,我的计划从雇员中提取所有参加调查的员工的信息。 But I don't know how to add the additional rows of information to the EMPLOYEES csv.
但是我不知道如何向EMPLOYEES csv中添加其他信息行。 I have spent much of the day reading StackOverflow about DictReader and Dictionary and am still confused.
我花了大部分时间阅读有关DictReader和Dictionary的StackOverflow,但仍然感到困惑。
Thank you in advance for your guidance. 预先感谢您的指导。
Sample EMPLOYEE: 员工样本:
Name Extension Job
Bill 1111 plumber
Alice 2222 fisherman
Carl 3333 rodeo clown
Sample SURVEY: 样本调查:
Extension Favorite Color Book
2222 blue A Secret Garden
3333 green To Kill a Mockingbird
Sample OUTPUT 样本输出
Name Extension Job Favorite Color Favorite Book
Bill 1111 plumber
Alice 2222 fisherman blue A Secret Garden
Carl 3333 rodeo clown green To Kill a Mockingbird
import csv
with open('employees.csv', "rU") as npr_employees:
employees = csv.DictReader(npr_employees)
all_employees = {}
total_employees = {}
for employee in employees:
all_employees[employee['Extension']] = employee
with open('social.csv', "rU") as social_employees:
social_employee = csv.DictReader(social_employees)
for row in social_employee:
print all_employees.get(row['Extension'], None)
You Could try: 您可以尝试:
for row in social_employee:
employee = all_employees.get(row['Extension'], None)
if employee is not None:
all_employees[employee['additionalinfo1']] = row['additionalinfo1']
all_employees[employee['additionalinfo2']] = row['additionalinfo2']
You can merge two dictionaries in Python using: 您可以使用以下命令在Python中合并两个字典 :
dict(d1.items() + d2.items())
Using a dict, all_employees
, with the key as 'Extension' works perfectly to link a "social employee" row with its corresponding "employee" row. 使用dict,
all_employees
,键为“扩展名”,可以完美地将“社会雇员”行与其相应的“雇员”行链接起来。
Then you need to go through all the updated employee info and output their fields in a consistent order. 然后,您需要浏览所有更新的员工信息,并以一致的顺序输出其字段。 Since dictionaries are inherently orderless, we keep a list of the headers,
output_headers
as we see them. 由于字典本质上是无序的,因此我们保留了标题列表,如我们所见,它们是
output_headers
。
import csv
# Store all the info about the employees
all_employees = {}
output_headers = []
# First, get all employee record info
with open('employees.csv', 'rU') as npr_employees:
employees = csv.DictReader(npr_employees)
for employee in employees:
ext = employee['Extension']
all_employees[ext] = employee
# Add headers from "all employees"
output_headers.extend(employees.fieldnames)
# Then, get all info from social, and update employee info
with open('social.csv', 'rU') as social_employees:
social_employees = csv.DictReader(social_employees)
for social_employee in social_employees:
ext = social_employee['Extension']
# Combine the two dictionaries.
all_employees[ext] = dict(
all_employees[ext].items() + social_employee.items()
)
# Add headers from "social employees", but don't add duplicate fields
output_headers.extend(
[field for field in social_employees.fieldnames
if field not in output_headers]
)
# Finally, output the records ordered by extension
with open('output.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerow(output_headers)
# Write the new employee rows. If a field doesn't exist,
# write an empty string.
for employee in sorted(all_employees.values()):
writer.writerow(
[employee.get(field, '') for field in output_headers]
)
outputs: 输出:
Name,Extension,Job,Favorite Color,Book
Bill,1111,plumber,,
Alice,2222,fisherman,blue,A Secret Garden
Carl,3333,rodeo clown,green,To Kill a Mockingbird
Let me know if you have any questions! 如果您有任何疑问,请告诉我!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.