I'm looking to compare two dataframes and find where they differ in Python. This wont be restricted to duplications either, and will also involve identifying new entries and adding the date of this change. For example:
df1 =
df2 =
df3 =
Please let me know if you need anymore detail, thank you.
With the dataframes you provided:
import pandas as pd
df1 = pd.DataFrame(
{
"item": ["bannana", "orange", "hammer"],
"size": [10, 5, 25],
"colour": ["yellow", "orange", "wood"],
}
)
df2 = pd.DataFrame(
{
"item": ["bannana", "orange", "snake", "hammer"],
"size": [12, 5, 55, 25],
"colour": ["yellow", "green", "green", "wood"],
}
)
Here is one way to do it:
TODAY = pd.Timestamp.today().strftime("%d/%m/%Y")
# Merge dataframes
df3 = pd.merge(
left=df2, right=df1, how="left", on="item", suffixes=["_new", "_old"]
).fillna("")
# Add new empty columns
df3["date_size_changed"] = ""
df3["date_colour_changed"] = ""
# Sort columns
df3 = df3.reindex(
columns=[
"item",
"size_new",
"size_old",
"date_size_changed",
"colour_new",
"colour_old",
"date_colour_changed",
]
)
# Compare values and add TODAY when different
for col in ["size", "colour"]:
df3[f"date_{col}_changed"] = df3.apply(
lambda x: TODAY
if x[f"{col}_new"] != x[f"{col}_old"] and x[f"{col}_old"] != ""
else "",
axis=1,
)
print(df3)
# Output
item size_new size_old date_size_changed colour_new colour_old \
0 bannana 12 10.0 18/09/2022 yellow yellow
1 orange 5 5.0 green orange
2 snake 55 green
3 hammer 25 25.0 wood wood
date_colour_changed
0
1 18/09/2022
2
3
This is not possible, very sorry
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.