简体   繁体   English

使用python合并2个csv文件

[英]Merge 2 csv files with python

I have 2 csv files as following: 我有2个csv文件,如下所示:

File1.csv: File1.csv:

Name, Email
Jon, jon@email.com
Roberto, roberto@email.com
Mona, mona@email.com
James, james@email.com

File2.csv: File2.csv:

Email
mona@email.com
james@email.com

What I want is File1.csv without File2.csv, iex File3.csv (the output) should look as following: 我想要的是没有File2.csv的File1.csv,IEx File3.csv(输出)应如下所示:

File3.csv: File3.csv:

Name, Email
Jon, jon@email.com
Roberto, roberto@email.com

What is the simplest way to code this in Python? 用Python编写此代码的最简单方法是什么?

dont_need_em = []
with open("file2.csv", 'r') as fn:
    for line in fn:
        if not line.startswith("Email"):
            dont_need_em.append(line.rstrip())

fw = open("file3.csv", 'w')

with open("file1.csv", 'r') as fn:
    for line in fn:
        if line.rstrip().split(", ")[1] not in dont_need_em: 
            fw.write(line.rstrip())
fw.close()

This should do it, but i am sure there are way simpler solutions 这应该可以做到,但是我敢肯定有更简单的解决方案

EDIT: Create the third file 编辑:创建第三个文件

Using Pandas you can do this: 使用熊猫,您可以执行以下操作:

import pandas as pd
#Read two files into data frame using column names from first row
file1=pd.read_csv('File1.csv',header=0,skipinitialspace=True)
file2=pd.read_csv('File2.csv',header=0,skipinitialspace=True)

#Only return lines in file 1 if the email is not contained in file 2
cleaned=file1[~file1["Email"].isin(file2["Email"])]

#Output file to CSV with original headers
cleaned.to_csv("File3.csv", index=False)

Here's a good way to do that (it's very similar to the above, but writes the remainder to file rather than printing: 这是一个很好的方法(与上面的非常相似,但是将其余部分写入文件而不是打印:

Removed = []
with open("file2.csv", 'r') as f2:
    for line in f2:
        if not line.startswith("Email"):
           removed.append(line.rstrip())


with open("file1.csv", 'r') as f1:
    with open("file3.csv", 'w') as f3:
        for line in f1:
            if line.rstrip().split(", ")[1] not in removed:
                f3.write(line)

How this works: The first block reads all the emails you want to filter out into a list. 工作原理:第一个块将要过滤的所有电子邮件读取到列表中。 Next, the second block opens your original file and sets up a new file to write what's left. 接下来,第二个块将打开您的原始文件,并设置一个新文件以写入剩余内容。 It reads each line from your first file and writes them to the third file only if the email isn't in your list to filter 仅当电子邮件不在您要过滤的列表中时,它才会从第一个文件中读取每一行并将其写入第三个文件中

If you are under UNIX: 如果您在UNIX下:

#! /usr/bin/env python
import subprocess
import sys

def filter(input_file, filter_file, out_file):
    subprocess.call("grep -f '%s' '%s' > '%s' " % (filter_file, input_file, out_file), shell=True)

The following should do what you are looking for. 以下应该做您想要的。 First read File2.csv into a set of email addresses to be skipped. 首先将File2.csv读入一set要跳过的电子邮件地址。 Then read File1.csv row by row, writing only rows which are not in the skip list: 然后逐行读取File1.csv ,仅写入不在跳过列表中的行:

import csv

with open('File2.csv', 'r') as file2:
    skip_list = set(line.strip() for line in file2.readlines()[1:])

with open('File1.csv', 'rb') as file1, open('File3.csv', 'wb') as file3:
    csv_file1 = csv.reader(file1, skipinitialspace=True)
    csv_file3 = csv.writer(file3)
    csv_file3.writerow(next(csv_file1))    # Write the header line

    for cols in csv_file1:
        if cols[1] not in skip_list:
            csv_file3.writerow(cols)

This would give you the following output in File3.csv : 这将在File3.csv提供以下输出:

Name,Email
Jon,jon@email.com
Roberto,roberto@email.com

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM