
[英]How to compare two massive spark dataframes on row level and print the difference
[英]How to compare Two dataframes row by row?
我有 152431 X 15 形状的数据框,我想要两帧的差异
# df1:
Date Fruit Num Color
2013-11-24 Banana 22.1 Yellow
2013-11-24 Orange 8.6 Orange
2013-11-24 Apple 7.6 Green
2013-11-24 Celery 10.2 Green
# df2:
Date Fruit Num Color
2013-11-24 Banana 22.1 Yellow
2013-11-24 Orange 8.6 Orange
2013-11-24 Apple 7.6 Green
2013-11-24 Celery 10.2 Green
2013-11-25 Apple 22.1 Red
2013-11-25 Orange 8.6 Orange
如果您的数据帧存储在两个文件中,我会在循环中读取每个文件的每一行并创建一个包含差异的列表:
old_file_path = 'INSERT_FILE_PATH_OF_FILE_A'
new_file_path = 'INSER_FILE_PATH_OF_FILE_B'
with open(old_file_path, 'r', encoding='utf-8') as old ,open(new_file_path, 'r', encoding='utf-8') as new:
fileone = old.readlines()
filetwo = new.readlines()
total_of_changes=[]
for line in filetwo:
if line not in fileone:
total_of_changes.append(line)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.