[英]Merging isn't working as expected for a column of dtype float64
I am working with two dataframes which looks like below; just small chunk of the whole data frame. The problem is if i do an inner
join using Merge function i am getting an empty result.
加入使用 Merge function 我得到一个空的结果。
DF1 COURSE_KEY CORP_ID 2.97E+11 23096 2.97E+11 23097 2.97E+11 10987 2.97E+11 560989 2.97E+11 34678 DF2 COURSE_KEY COURSE_UNIQUE_KEY COURSE_ID CERTIFICATION_ID 2.97E+11 4077 WW_13456 WFT-CK-027 2.97E+11 5789 ww_13456 NL-WFT-12121
df3 = pd.merge(Df1,Df2, on='COURSE_KEY' how='inner') df3 = pd.merge(Df1,Df2, on='COURSE_KEY' how='inner')
Merging on float values may not be the best option as float values are theoretically infinite.合并浮点值可能不是最佳选择,因为浮点值理论上是无限的。
For practical usage we can use int:对于实际使用,我们可以使用 int:
n = 100
df1['COURSE_KEY'] = np.round(df1.COURSE_KEY*n).astype(int)
df2['COURSE_KEY'] = np.round(df2.COURSE_KEY*n).astype(int)
df = pd.merge(df1, df2, how = 'inner', on = 'COURSE_KEY')
df['COURSE_KEY'] = df.COURSE_KEY/n
df
Output Output
COURSE_KEY CORP_ID COURSE_UNIQUE_KEY COURSE_ID CERTIFICATION_ID
0 2.970000e+11 23096 4077 WW_13456 WFT-CK-027
1 2.970000e+11 23096 5789 ww_13456 NL-WFT-12121
2 2.970000e+11 23097 4077 WW_13456 WFT-CK-027
3 2.970000e+11 23097 5789 ww_13456 NL-WFT-12121
4 2.970000e+11 10987 4077 WW_13456 WFT-CK-027
5 2.970000e+11 10987 5789 ww_13456 NL-WFT-12121
6 2.970000e+11 560989 4077 WW_13456 WFT-CK-027
7 2.970000e+11 560989 5789 ww_13456 NL-WFT-12121
8 2.970000e+11 34678 4077 WW_13456 WFT-CK-027
9 2.970000e+11 34678 5789 ww_13456 NL-WFT-12121
It is not exactly clear what the exact issue might be, but a good guess is perhaps that there are differences in the float values.目前尚不清楚确切的问题可能是什么,但一个很好的猜测可能是浮点值存在差异。 Float comparisons should be done by checking closeness and not sameness because there are often decimal variations.
浮点比较应该通过检查接近性而不是相同性来完成,因为通常存在小数变化。
Depending on the context, you might need to use a better key datatype, say an int or a string, or check for closeness, say by doing something [this answer][1]
does.根据上下文,您可能需要使用更好的键数据类型,例如 int 或字符串,或检查紧密度,例如通过执行
[this answer][1]
执行的操作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.