I have two dataframes. They can have multiple values for the same product id. What would be the best way to compare their values? I have tried comparing them with compare, from csv_diff library, but it is based on a unique key. However, my dataframes don't have a unique key, having multiple entries for the same product_name.
diff = compare(
load_csv(open("df1.csv"), key="product_name"),
load_csv(open("df2.csv"), key="product_name")
)
The dataframes look like below:
df1:
product name value value2 value3 value4 value5 value6 value7 ...
0 1234PROD 1 2 3 4 5 6 7 ...
1 1234PROD 7 4 4 7 8 7 8 ...
2 1234PROD 8 7 4 7 8 7 8 ...
df2:
product name value value2 value3 value4 value5 value6 value7 ...
0 4567PROD 1 2 3 4 5 6 9 ...
1 8767PROD 7 4 4 7 8 7 8 ...
2 1234PROD 5 7 4 7 8 7 8 ...
3 1234PROD 8 7 4 7 8 7 8 ...
I would like to obtain the summary of their changes, something similar to:
changes:
[{'key': '1234PROD',
'changes': {'value': [1, 5],
'value1': [2,7],
'value2': [3,4]
}]
I'm not sure what your expected output should be, but you could try the following:
df1.apply(lambda row: row == df2[df2.product_name == row.product_name], axis=1)
The result is an object where each row has all rows that corresponds with the product name. You can search that result per row:
result[2]:
index product_name value value2 value3 value4 value5 value6 value7
2 True False True True True True True True
3 True True True True True True True True
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.