How to compare cells values from two different csv using python pandas

Question

I have two csv files, they have the same columns (filename and MD5), however, the values are in different rows (filename in csv1 is in row 2 (row 1 is header) however, the same filename in csv2 maybe in row 5.

I've tried the "merge" module with the "how" set to: right, left, inner, an outer; the results added additional rows and columns were added. I also tried the "isin" module.

matchfiles = (df1.Filename.isin(df2.Filename)

and

if (df1[['Filename','MD5']]) == (df2[['Filename','MD5']]):
    print(df1[['Filename','MD5']])

I expect the output to print the "Filename" with the matching "MD5".

The errors are:

TypeError: unsupported operand type(s) for &: 'str' and 'bool'

and

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

DataSet 1:
Filename MD5

I417122 - KP -pst 125e46b4477934fa7495f
I417122 - KP - xml eee4acefced33e6595a32
J944737 - DJ gif f52483135c9e8f6fb2680
J944737 - DJ txt c1b76990e2e19a7eb2332
J944737 - DJ doc b1aa2e981d8c04860810
J944737 - DJ docx 55b325a7ef73ba8a0e2f9
J944737 - JD.zip 47fcccba65018d88a3c7e

DataSet 2:
Filename MD5

I417122 - KP -pst 125e46b4477934fa7495f
I417122 - KP - xml 47fcccba65018d88a3c7e
J944737 - DJ gif f52483135c9e8f6fb2680
J944737 - DJ txt c1b76990e2e19a7eb2856
J944737 - DJ doc eee4acefced33e6595a32
J944737 - DJ docx 55b325a7ef73ba8a0e2f9
J944737 - JD.zip 47fcccba65018d88a3c7e

Expected Results: Filename MD5

I417122 - KP -pst 125e46b4477934fa7495f
J944737 - DJ gif f52483135c9e8f6fb2680
J944737 - DJ doc eee4acefced33e6595a32
J944737 - DJ docx 55b325a7ef73ba8a0e2f9
J944737 - JD.zip 47fcccba65018d88a3c7e

Answer 1

This will return a two-column dataframe that has 1 if the filename from csv1 is in csv2 and 0 otherwise.

matching_df = df1.assign(Indf2=df1.Filename.isin(df2.Filename).astype(int))

Then you could remove all the rows where the second column is zero and then merge based on Filename :

matching_df = matching_df[matching_df.Indf2 == 1]
final_df = matching_df.merge(df1, how="left", on="Filename")
final_df = final_df.drop(columns=["Filename_y", "Indf2"])
print(final_df)

How to compare cells values from two different csv using python pandas

Question

1 answers

solution1
0 2019-07-25 18:54:55

How to compare cells values from two different csv using python pandas

Question

1 answers

solution1 0 2019-07-25 18:54:55

solution1
0 2019-07-25 18:54:55