I am exporting hdfs query output into a csv file using INSERT OVERWRITE LOCAL DIRECTORY command. Since this export the data without header. I got another dataframe from Oracle output with file header which I need to compare against hdfs output.
df1 = pd.read_csv('/home/User/hdfs_result.csv', header = None)
print(df1)
0 1 2
0 XPRN A 2019-12-16 00:00:00
1 XPRW I 2019-12-16 00:00:00
2 XPS2 I 2003-09-30 00:00:00
df = pd.read_sql(sqlquery, sqlconn)
UNIT STATUS Date
0 XPRN A 2019-12-16 00:00:00
1 XPRW A 2019-12-16 00:00:00
2 XPS2 I 2003-09-30 00:00:00
Since df1 is having no header i cant use Merge or Join to compare data. Though I can do df-df1.
Please suggest how can i compare and print the difference?
You can pass the underlying numpy array for comparison:
df2.where(df2==df1.values)
Output (difference are masked as NaN
)
UNIT STATUS Date
0 XPRN A 2019-12-16 00:00:00
1 XPRW NaN 2019-12-16 00:00:00
2 XPS2 I 2003-09-30 00:00:00
For non matching row:
df2[(df2!=df1.values).any(1)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.