I have two different dataframes which i need to compare.
These two dataframes are having different number of rows and doesnt have a Pk its Composite primarykey of (id||ver||name||prd||loc)
df1:
id ver name prd loc
a 1 surya 1a x
a 1 surya 1a y
a 2 ram 1a x
b 1 alex 1b z
b 1 alex 1b y
b 2 david 1b z
df2:
id ver name prd loc
a 1 surya 1a x
a 1 surya 1a y
a 2 ram 1a x
b 1 alex 1b z
I tried the below code and this workingif there are same number of rows, but if its like the above case its not working.
df1 = pd.DataFrame(Source)
df1 = df1.astype(str) #converting all elements as objects for easy comparison
df2 = pd.DataFrame(Target)
df2 = df2.astype(str) #converting all elements as objects for easy comparison
header_list = df1.columns.tolist() #creating a list of column names from df1 as the both df has same structure
df3 = pd.DataFrame(data=None, columns=df1.columns, index=df1.index)
for x in range(len(header_list)) :
df3[header_list[x]] = np.where(df1[header_list[x]] == df2[header_list[x]], 'True', 'False')
df3.to_csv('Output', index=False)
Please leet me know how to compare the datasets if there are different number od rows.
You can try this:
~df1.isin(df2)
# df1[~df1.isin(df2)].dropna()
Lets consider a quick example:
df1 = pd.DataFrame({
'Buyer': ['Carl', 'Carl', 'Carl'],
'Quantity': [18, 3, 5, ]})
# Buyer Quantity
# 0 Carl 18
# 1 Carl 3
# 2 Carl 5
df2 = pd.DataFrame({
'Buyer': ['Carl', 'Mark', 'Carl', 'Carl'],
'Quantity': [2, 1, 18, 5]})
# Buyer Quantity
# 0 Carl 2
# 1 Mark 1
# 2 Carl 18
# 3 Carl 5
~df2.isin(df1)
# Buyer Quantity
# 0 False True
# 1 True True
# 2 False True
# 3 True True
df2[~df2.isin(df1)].dropna()
# Buyer Quantity
# 1 Mark 1
# 3 Carl 5
Another idea can be merge on the same column names.
Sure, tweak the code to your needs. Hope this helped:)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.