如何比较 2 个 CSV 文件

Question

I have 2 CSV files:我有 2 个 CSV 文件：

CSV 1 - original_names.csv CSV 1 - original_names.csv

Serial,Names
1,James
2,Stephen
3,Ben
4,Harry
5,Jack
6, Peter

CSV 2 - dup_names.csv CSV 2 - dup_names.csv

Serial,Names
1,James
2,Kate
3,Ben
4,Sara

Desired Output - new.csv所需的 Output - new.csv

Serial,Names,flag
1,0,T
2,Kate,F
3,0,T
4,Sara,F
5,Jack,F
6,Peter,F

As you can see, the same names in both CSV will be updated to 0 if names matches to new.csv.如您所见，如果名称与 new.csv 匹配，则 CSV 中的相同名称将更新为 0。

This is what I've tried:这是我尝试过的：

import pandas as pd

df1 = pd.read_csv('original_names.csv')
df2 = pd.read_csv('dup_names.csv')

out = df1.merge(df2['names'], how='inner', on = 'names')

# some code

out.to_csv("new.csv", index=False)

Thank you for your time:)感谢您的时间：）

Answer 1

Do an outer join, then just add some logic here.做一个外连接，然后在这里添加一些逻辑。 If the 2 name columns match, put a 'T' flag in, else put 'F' .如果 2 个名称列匹配，则放入'T'标志，否则放入'F' 。 Then replace the 'names' should be 0 is 'T' , else the name in the second csv.然后替换'names'应该是0是'T' ，否则第二个 csv 中的名称。 If there is no name in the second csv, fill those with the name from the first csv.如果第二个 csv 中没有名称，则填写第一个 csv 中的名称。

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'serial':[1,2,3,4,5,6],
                     'names':['James','Stephen','Ben','Harry','Jack','Peter']})

df2 = pd.DataFrame({'serial':[1,2,3,4,],
                     'names':['James','Kate','Ben','Sara']})


out = df1.merge(df2, how='outer', on = ['serial'])

out['flag'] = np.where(out.names_x == out.names_y, 'T', 'F')
out['names'] = np.where(out.flag == 'T', 0, out.names_y)
out['names'] = out['names'].fillna(out.names_x)

out = out[['serial', 'names', 'flag']]
out.to_csv("new.csv", index=False)

Output: Output：

print(out)
   serial  names flag
0       1      0    T
1       2   Kate    F
2       3      0    T
3       4   Sara    F
4       5   Jack    F
5       6  Peter    F

Answer 2

You could use:你可以使用：

import pandas as pd
import numpy as np

df1 = pd.read_csv('original_names.csv')
df2 = pd.read_csv('dup_names.csv')

out = df1.merge(df2, how='left', on = 'Serial')

out['Names'] = np.where(out['Names_x'] == out['Names_y'], 
                        0, out['Names_y'])
out['Names'] = out['Names'].fillna(out['Names_x'])
out['flag'] = np.where(out['Names'] == 0, 'T', 'F')
out = out.drop(['Names_x', 'Names_y'], axis=1)

out.to_csv('new.csv', index=False)

Output: Output：

   serial  names flag
0       1      0    T
1       2   Kate    F
2       3      0    T
3       4   Sara    F
4       5   Jack    F
5       6  Peter    F

如何比较 2 个 CSV 文件

问题描述

2 个解决方案

解决方案1
0 2022-01-26 15:10:06

解决方案2
0 2022-01-26 15:16:24

如何比较 2 个 CSV 文件

问题描述

2 个解决方案

解决方案1 0 2022-01-26 15:10:06

解决方案2 0 2022-01-26 15:16:24

解决方案1
0 2022-01-26 15:10:06

解决方案2
0 2022-01-26 15:16:24