简体   繁体   English

如何合并两个csv文件?

[英]How do I merge two csv files?

I have two csv files. 我有两个csv文件。 EMPLOYEES contains a dict of every employee at a company with 10 rows of information about each one. 员工包含公司中每个员工的意见,每个员工都有10行信息。 SOCIAL contains a dict of employees who filled out a survey, with 8 rows of information. SOCIAL包含对员工的命令,这些员工填写了调查,并提供8行信息。 Every employee in survey is also on the master dict. 接受调查的每位员工也都遵循主旨。 Both dicts have a unique identifier (the EXTENSION.) 这两个字典都有一个唯一的标识符(扩展名)。

I want to say "If an employee is on the SOCIAL dict, add rows 4,5,6 to their column in the EMPLOYEES dict" In other words, if an employee filled out a survey, additional information should be appended to the master dict. 我想说的是“如果某个雇员在“社交”词典中,则在其“雇员”词典中的列中添加第4,5,6行”换句话说,如果某个雇员填写了调查表,则应在主词典中附加其他信息。

Currently, my program pulls out all information from EMPLOYEES for employees who have taken the SURVEY. 目前,我的计划从雇员中提取所有参加调查的员工的信息。 But I don't know how to add the additional rows of information to the EMPLOYEES csv. 但是我不知道如何向E​​MPLOYEES csv中添加其他信息行。 I have spent much of the day reading StackOverflow about DictReader and Dictionary and am still confused. 我花了大部分时间阅读有关DictReader和Dictionary的StackOverflow,但仍然感到困惑。

Thank you in advance for your guidance. 预先感谢您的指导。

Sample EMPLOYEE: 员工样本:

Name  Extension   Job
Bill  1111        plumber
Alice 2222        fisherman
Carl  3333        rodeo clown

Sample SURVEY: 样本调查:

Extension   Favorite Color    Book
 2222          blue          A Secret Garden
 3333          green         To Kill a Mockingbird

Sample OUTPUT 样本输出

Name  Extension   Job           Favorite Color     Favorite Book
Bill  1111        plumber
Alice 2222        fisherman         blue             A Secret Garden
Carl  3333        rodeo clown       green            To Kill a Mockingbird


import csv

with open('employees.csv', "rU") as npr_employees:
   employees = csv.DictReader(npr_employees)
   all_employees = {}
   total_employees = {}
   for employee in employees:
       all_employees[employee['Extension']] = employee

with open('social.csv', "rU") as social_employees:
   social_employee = csv.DictReader(social_employees) 
   for row in social_employee:
       print all_employees.get(row['Extension'], None)

You Could try: 您可以尝试:

for row in social_employee:
    employee = all_employees.get(row['Extension'], None)
    if employee is not None:
        all_employees[employee['additionalinfo1']] = row['additionalinfo1']
        all_employees[employee['additionalinfo2']] = row['additionalinfo2']

You can merge two dictionaries in Python using: 您可以使用以下命令在Python中合并两个字典

dict(d1.items() + d2.items())

Using a dict, all_employees , with the key as 'Extension' works perfectly to link a "social employee" row with its corresponding "employee" row. 使用dict, all_employees ,键为“扩展名”,可以完美地将“社会雇员”行与其相应的“雇员”行链接起来。

Then you need to go through all the updated employee info and output their fields in a consistent order. 然后,您需要浏览所有更新的员工信息,并以一致的顺序输出其字段。 Since dictionaries are inherently orderless, we keep a list of the headers, output_headers as we see them. 由于字典本质上是无序的,因此我们保留了标题列表,如我们所见,它们是output_headers

import csv

# Store all the info about the employees
all_employees = {}
output_headers = []

# First, get all employee record info
with open('employees.csv', 'rU') as npr_employees:
    employees = csv.DictReader(npr_employees)
    for employee in employees:
        ext = employee['Extension']
        all_employees[ext] = employee
    # Add headers from "all employees"
    output_headers.extend(employees.fieldnames)

# Then, get all info from social, and update employee info
with open('social.csv', 'rU') as social_employees:
    social_employees = csv.DictReader(social_employees) 
    for social_employee in social_employees:
        ext = social_employee['Extension']

        # Combine the two dictionaries.
        all_employees[ext] = dict(
                all_employees[ext].items() + social_employee.items()
        )

    # Add headers from "social employees", but don't add duplicate fields
    output_headers.extend(
            [field for field in social_employees.fieldnames
            if field not in output_headers]
    )

# Finally, output the records ordered by extension
with open('output.csv', 'wb') as f:
    writer = csv.writer(f)
    writer.writerow(output_headers)

    # Write the new employee rows.  If a field doesn't exist, 
    # write an empty string.
    for employee in sorted(all_employees.values()):
        writer.writerow(
                [employee.get(field, '') for field in output_headers]
        )

outputs: 输出:

Name,Extension,Job,Favorite Color,Book
Bill,1111,plumber,,
Alice,2222,fisherman,blue,A Secret Garden
Carl,3333,rodeo clown,green,To Kill a Mockingbird

Let me know if you have any questions! 如果您有任何疑问,请告诉我!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM