I have data in two excel files like below
df1 = {'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [1, 1, 2]}
df1 = pd.DataFrame(df1, columns=df1.keys())
df2 = {'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [2, 1, 2]}
df2 = pd.DataFrame(df2, columns=df2.keys())
please help me to get difference of both excels as below..
Transaction_name Count_df1 Count_df2
SC-001_Homepage 1 2
SC-001_Homepage 1 1
SC-001_Homepage 2 2
First line of the output count is not matching. Will i be able to highlight in different color? Sample code is as below
#COmparing both excels
df1 = pd.read_csv(r"WLMOUTPUT.csv", dtype=object)
df2 = pd.read_csv(r"results.csv", dtype=object)
print('\n', df1)
print('\n',df2)
df1['Compare'] = df1['Transaction_Name'] + df1['Count'].astype(str)
df2['Compare'] = df2['Transaction_Name'] + df2['Count'].astype(str)
print('\n', df1.loc[~df1['Compare'].isin(df2['Compare'])])
Thanks in advance
You can use the merge
function.
import pandas as pd
df1 = pd.DataFrame({'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [1, 1, 2]})
df2 = pd.DataFrame({'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [2, 1, 2]})
merged_df = pd.merge(df1, df2, on = 'Transaction_Name', suffixes=('_df1', '_df2'))
This will give you this DataFrame:
print(merged_df)
Count_df1 Transaction_Name Count_df2
0 1 SC-001_Homepage 2
1 1 SC-002_Homepage 1
2 2 SC-001_Signinlink 2
And then you can just use subsetting to see which rows have different counts:
diff = merged_df[merged_df['Count_df1'] != merged_df['Count_df2']]
And you will get this:
print(diff)
Count_df1 Transaction_Name Count_df2
0 1 SC-001_Homepage 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.