简体   繁体   English

如何使用python pandas比较来自两个不同csv的单元格值

[英]How to compare cells values from two different csv using python pandas

I have two csv files, they have the same columns (filename and MD5), however, the values are in different rows (filename in csv1 is in row 2 (row 1 is header) however, the same filename in csv2 maybe in row 5. 我有两个csv文件,它们具有相同的列(文件名和MD5),但是,值位于不同的行中(csv1中的文件名位于第2行(行1是标头)),但是csv2中的相同文件名可能位于第5行。

I've tried the "merge" module with the "how" set to: right, left, inner, an outer; 我尝试将“如何”设置为“合并”模块:右,左,内部,外部; the results added additional rows and columns were added. 结果添加了额外的行和列。 I also tried the "isin" module. 我还尝试了“ isin”模块。

matchfiles = (df1.Filename.isin(df2.Filename)

and

if (df1[['Filename','MD5']]) == (df2[['Filename','MD5']]):
    print(df1[['Filename','MD5']])

I expect the output to print the "Filename" with the matching "MD5". 我希望输出显示匹配“ MD5”的“文件名”。

The errors are: 错误是:

TypeError: unsupported operand type(s) for &: 'str' and 'bool' 

and

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

DataSet 1: 数据集1:
Filename MD5 文件名MD5

I417122 - KP -pst 125e46b4477934fa7495f I417122-KP -pst 125e46b4477934fa7495f
I417122 - KP - xml eee4acefced33e6595a32 I417122-KP-xml eee4acefced33e6595a32
J944737 - DJ gif f52483135c9e8f6fb2680 J944737-DJ gif f52483135c9e8f6fb2680
J944737 - DJ txt c1b76990e2e19a7eb2332 J944737-DJ txt c1b76990e2e19a7eb2332
J944737 - DJ doc b1aa2e981d8c04860810 J944737-DJ文档b1aa2e981d8c04860810
J944737 - DJ docx 55b325a7ef73ba8a0e2f9 J944737-DJ docx 55b325a7ef73ba8a0e2f9
J944737 - JD.zip 47fcccba65018d88a3c7e J944737-JD.zip 47fcccba65018d88a3c7e

DataSet 2: 数据集2:
Filename MD5 文件名MD5

I417122 - KP -pst 125e46b4477934fa7495f I417122-KP -pst 125e46b4477934fa7495f
I417122 - KP - xml 47fcccba65018d88a3c7e I417122-KP-XML 47FCCCBA65018D88A3C7E
J944737 - DJ gif f52483135c9e8f6fb2680 J944737-DJ gif f52483135c9e8f6fb2680
J944737 - DJ txt c1b76990e2e19a7eb2856 J944737-DJ txt c1b76990e2e19a7eb2856
J944737 - DJ doc eee4acefced33e6595a32 J944737-DJ文档eee4acefced33e6595a32
J944737 - DJ docx 55b325a7ef73ba8a0e2f9 J944737-DJ docx 55b325a7ef73ba8a0e2f9
J944737 - JD.zip 47fcccba65018d88a3c7e J944737-JD.zip 47fcccba65018d88a3c7e

Expected Results: Filename MD5 预期结果:文件名MD5

I417122 - KP -pst 125e46b4477934fa7495f I417122-KP -pst 125e46b4477934fa7495f
J944737 - DJ gif f52483135c9e8f6fb2680 J944737-DJ gif f52483135c9e8f6fb2680
J944737 - DJ doc eee4acefced33e6595a32 J944737-DJ文档eee4acefced33e6595a32
J944737 - DJ docx 55b325a7ef73ba8a0e2f9 J944737-DJ docx 55b325a7ef73ba8a0e2f9
J944737 - JD.zip 47fcccba65018d88a3c7e J944737-JD.zip 47fcccba65018d88a3c7e

This will return a two-column dataframe that has 1 if the filename from csv1 is in csv2 and 0 otherwise. 如果来自csv1的文件名位于csv2中,则这将返回一个包含1的两列数据帧,否则返回0

matching_df = df1.assign(Indf2=df1.Filename.isin(df2.Filename).astype(int))

Then you could remove all the rows where the second column is zero and then merge based on Filename : 然后,您可以删除第二列为零的所有行,然后根据Filename合并:

matching_df = matching_df[matching_df.Indf2 == 1]
final_df = matching_df.merge(df1, how="left", on="Filename")
final_df = final_df.drop(columns=["Filename_y", "Indf2"])
print(final_df)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python-比较来自两个不同csv的两列中的相似值 - Python - Compare similar values in two columns from two different csv Python Pandas:如何比较单元格和两列的值以及如何使用 If...Else 语句创建具有新值的另一列 - Python Pandas: How to compare values of cells and two columns and maybe using If...Else statement to create another column with new values 如何使用python比较两个不同的csv文件? - How can I compare two different csv file using python? 如何使用python pandas数据帧比较然后连接来自两个不同行的信息 - How to compare and then concatenate information from two different rows using python pandas data frames 如何在python中使用熊猫比较2个csv文件 - how to compare 2 csv files using pandas in python 如何按列比较两个CSV文件并使用Pandas Python将CSV文件中的差异保存 - How to compare two CSV files by column and save the differences in csv file using pandas python 如何比较pandas Python 2.7中给定DataFrame中的两个单元格 - How to compare two cells in a given DataFrame in pandas Python 2.7 比较Python中两个不同字典的值? - Compare values from two different dictionaries in Python? Python 比较两个不同的 CSV 文件哪些值不在同一行中 - Python compare two different CSV Files which values are not in the same rows 如何比较python中两个CSV的列? - How to compare columns from two CSV in python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM