[英]Merge 2 csv files with python
我有2個csv文件,如下所示:
File1.csv:
Name, Email
Jon, jon@email.com
Roberto, roberto@email.com
Mona, mona@email.com
James, james@email.com
File2.csv:
Email
mona@email.com
james@email.com
我想要的是沒有File2.csv的File1.csv,IEx File3.csv(輸出)應如下所示:
File3.csv:
Name, Email
Jon, jon@email.com
Roberto, roberto@email.com
用Python編寫此代碼的最簡單方法是什么?
dont_need_em = []
with open("file2.csv", 'r') as fn:
for line in fn:
if not line.startswith("Email"):
dont_need_em.append(line.rstrip())
fw = open("file3.csv", 'w')
with open("file1.csv", 'r') as fn:
for line in fn:
if line.rstrip().split(", ")[1] not in dont_need_em:
fw.write(line.rstrip())
fw.close()
這應該可以做到,但是我敢肯定有更簡單的解決方案
編輯:創建第三個文件
使用熊貓,您可以執行以下操作:
import pandas as pd
#Read two files into data frame using column names from first row
file1=pd.read_csv('File1.csv',header=0,skipinitialspace=True)
file2=pd.read_csv('File2.csv',header=0,skipinitialspace=True)
#Only return lines in file 1 if the email is not contained in file 2
cleaned=file1[~file1["Email"].isin(file2["Email"])]
#Output file to CSV with original headers
cleaned.to_csv("File3.csv", index=False)
這是一個很好的方法(與上面的非常相似,但是將其余部分寫入文件而不是打印:
Removed = []
with open("file2.csv", 'r') as f2:
for line in f2:
if not line.startswith("Email"):
removed.append(line.rstrip())
with open("file1.csv", 'r') as f1:
with open("file3.csv", 'w') as f3:
for line in f1:
if line.rstrip().split(", ")[1] not in removed:
f3.write(line)
工作原理:第一個塊將要過濾的所有電子郵件讀取到列表中。 接下來,第二個塊將打開您的原始文件,並設置一個新文件以寫入剩余內容。 僅當電子郵件不在您要過濾的列表中時,它才會從第一個文件中讀取每一行並將其寫入第三個文件中
如果您在UNIX下:
#! /usr/bin/env python
import subprocess
import sys
def filter(input_file, filter_file, out_file):
subprocess.call("grep -f '%s' '%s' > '%s' " % (filter_file, input_file, out_file), shell=True)
以下應該做您想要的。 首先將File2.csv
讀入一set
要跳過的電子郵件地址。 然后逐行讀取File1.csv
,僅寫入不在跳過列表中的行:
import csv
with open('File2.csv', 'r') as file2:
skip_list = set(line.strip() for line in file2.readlines()[1:])
with open('File1.csv', 'rb') as file1, open('File3.csv', 'wb') as file3:
csv_file1 = csv.reader(file1, skipinitialspace=True)
csv_file3 = csv.writer(file3)
csv_file3.writerow(next(csv_file1)) # Write the header line
for cols in csv_file1:
if cols[1] not in skip_list:
csv_file3.writerow(cols)
這將在File3.csv
提供以下輸出:
Name,Email
Jon,jon@email.com
Roberto,roberto@email.com
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.