簡體   English   中英

使用python合並2個csv文件

[英]Merge 2 csv files with python

我有2個csv文件,如下所示:

File1.csv:

Name, Email
Jon, jon@email.com
Roberto, roberto@email.com
Mona, mona@email.com
James, james@email.com

File2.csv:

Email
mona@email.com
james@email.com

我想要的是沒有File2.csv的File1.csv,IEx File3.csv(輸出)應如下所示:

File3.csv:

Name, Email
Jon, jon@email.com
Roberto, roberto@email.com

用Python編寫此代碼的最簡單方法是什么?

dont_need_em = []
with open("file2.csv", 'r') as fn:
    for line in fn:
        if not line.startswith("Email"):
            dont_need_em.append(line.rstrip())

fw = open("file3.csv", 'w')

with open("file1.csv", 'r') as fn:
    for line in fn:
        if line.rstrip().split(", ")[1] not in dont_need_em: 
            fw.write(line.rstrip())
fw.close()

這應該可以做到,但是我敢肯定有更簡單的解決方案

編輯:創建第三個文件

使用熊貓,您可以執行以下操作:

import pandas as pd
#Read two files into data frame using column names from first row
file1=pd.read_csv('File1.csv',header=0,skipinitialspace=True)
file2=pd.read_csv('File2.csv',header=0,skipinitialspace=True)

#Only return lines in file 1 if the email is not contained in file 2
cleaned=file1[~file1["Email"].isin(file2["Email"])]

#Output file to CSV with original headers
cleaned.to_csv("File3.csv", index=False)

這是一個很好的方法(與上面的非常相似,但是將其余部分寫入文件而不是打印:

Removed = []
with open("file2.csv", 'r') as f2:
    for line in f2:
        if not line.startswith("Email"):
           removed.append(line.rstrip())


with open("file1.csv", 'r') as f1:
    with open("file3.csv", 'w') as f3:
        for line in f1:
            if line.rstrip().split(", ")[1] not in removed:
                f3.write(line)

工作原理:第一個塊將要過濾的所有電子郵件讀取到列表中。 接下來,第二個塊將打開您的原始文件,並設置一個新文件以寫入剩余內容。 僅當電子郵件不在您要過濾的列表中時,它才會從第一個文件中讀取每一行並將其寫入第三個文件中

如果您在UNIX下:

#! /usr/bin/env python
import subprocess
import sys

def filter(input_file, filter_file, out_file):
    subprocess.call("grep -f '%s' '%s' > '%s' " % (filter_file, input_file, out_file), shell=True)

以下應該做您想要的。 首先將File2.csv讀入一set要跳過的電子郵件地址。 然后逐行讀取File1.csv ,僅寫入不在跳過列表中的行:

import csv

with open('File2.csv', 'r') as file2:
    skip_list = set(line.strip() for line in file2.readlines()[1:])

with open('File1.csv', 'rb') as file1, open('File3.csv', 'wb') as file3:
    csv_file1 = csv.reader(file1, skipinitialspace=True)
    csv_file3 = csv.writer(file3)
    csv_file3.writerow(next(csv_file1))    # Write the header line

    for cols in csv_file1:
        if cols[1] not in skip_list:
            csv_file3.writerow(cols)

這將在File3.csv提供以下輸出:

Name,Email
Jon,jon@email.com
Roberto,roberto@email.com

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM