[英]Compare two columns in two csv files in python
我有兩個具有相同列名的csv文件:
我想比較file1.column1和file2.column1
我不知道該怎么做。 我看過大熊貓的東西,但沒有做任何有效的事情
我所擁有的是:
file1.csv:
name;DOB;service;test status;test date
Smith;12/12/2012;compta;Missed;01/01/2019
foo;02/11/1989;office;Passed;01/01/2019
bar;03/09/1972;sales;Passed;02/03/2018
Doe;25/03/1958;garage;Missed;02/04/2019
Smith;12/12/2012;compta;Passed;04/05/2019
file2.csv:
name;DOB;service;test status;test date
Smith;12/12/2012;compta;Missed;01/01/2019
Doe;25/03/1958;garage;Missed;02/04/2019
我想要得到的是:
file1.csv:
name;DOB;service;test status;test date
Smith;12/12/2012;compta;Missed;01/01/2019
foo;02/11/1989;office;Passed;01/01/2019
bar;03/09/1972;sales;Passed;02/03/2018
Doe;25/03/1958;garage;Missed;02/04/2019
Smith;12/12/2012;compta;Passed;04/05/2019
file2.csv:
name;DOB;service;test status;test date
Doe;25/03/1958;garage;Missed;02/04/2019
因此,首先您必須打開:
import pandas as pd
df1 = pd.read_csv('file1.csv',delimiter=';')
df2 = pd.read_csv('file2.csv',delimiter=';')
處理數據幀,因為發現空白
df1.columns= df1.columns.str.strip()
df2.columns= df2.columns.str.strip()
# Assuming only strings
df1 = df1.apply(lambda column: column.str.strip())
df2 = df2.apply(lambda column: column.str.strip())
預期的解決方案是,假設您的名字是UNIQUE。
合並文件
new_merged_df = df2.merge(df1[['name','test status']],'left',on=['name'],suffixes=('','file1'))
DataFrame結果:
name DOB service test status test date test statusfile1
0 Smith 12/12/2012 compta Missed 01/01/2019 Missed
1 Smith 12/12/2012 compta Missed 01/01/2019 Passed
2 Doe 25/03/1958 garage Missed 02/04/2019 Missed
根據需求進行過濾,並刪除名稱與測試狀態不同的行。
filter = new_merged_df['test status'] != new_merged_df['test statusfile1']
# Check if there is different values
if len(new_merged_df[filter]) > 0:
drop_names = list(new_merged_df[filter]['name'])
# Removing the values that we don't want
new_merged_df = new_merged_df[~new_merged_df['name'].isin(drop_names)]
刪除列並存儲
# Saving as a file with the same schema as file2
new_merged_df.drop(columns=['test statusfile1'],inplace=True)
new_merged_df.to_csv('file2.csv',delimiter=';',index=False)
結果
name DOB service test status test date
2 Doe 25/03/1958 garage Missed 02/04/2019
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.