What is the best way to compare two dataframes with multiple entries for a key?

Question

I have two dataframes. They can have multiple values for the same product id. What would be the best way to compare their values? I have tried comparing them with compare, from csv_diff library, but it is based on a unique key. However, my dataframes don't have a unique key, having multiple entries for the same product_name.

diff = compare(
    load_csv(open("df1.csv"), key="product_name"),
    load_csv(open("df2.csv"), key="product_name")
)

The dataframes look like below:

df1:
product name    value   value2  value3  value4  value5  value6  value7  ...
0   1234PROD    1       2       3       4       5       6       7       ...
1   1234PROD    7       4       4       7       8       7       8       ...
2   1234PROD    8       7       4       7       8       7       8       ...


df2:
product name    value   value2  value3  value4  value5  value6  value7  ...
0   4567PROD    1       2       3       4       5       6       9       ...
1   8767PROD    7       4       4       7       8       7       8       ...
2   1234PROD    5       7       4       7       8       7       8       ...
3   1234PROD    8       7       4       7       8       7       8       ...

I would like to obtain the summary of their changes, something similar to:

changes:
 [{'key': '1234PROD',
   'changes': {'value': [1, 5],
    'value1': [2,7],
    'value2': [3,4]
}]

Answer 1

I'm not sure what your expected output should be, but you could try the following:

df1.apply(lambda row: row == df2[df2.product_name == row.product_name], axis=1)

The result is an object where each row has all rows that corresponds with the product name. You can search that result per row:

result[2]:
index   product_name    value   value2  value3  value4  value5  value6  value7
2       True            False   True    True    True    True    True    True
3       True            True    True    True    True    True    True    True

What is the best way to compare two dataframes with multiple entries for a key?

Question

1 answers

solution1
0 2022-11-25 10:22:34

What is the best way to compare two dataframes with multiple entries for a key?

Question

1 answers

solution1 0 2022-11-25 10:22:34

solution1
0 2022-11-25 10:22:34