简体   繁体   English

对于 dtype float64 的列,合并没有按预期工作

[英]Merging isn't working as expected for a column of dtype float64

I am working with two dataframes which looks like below; just small chunk of the whole data frame. The problem is if i do an inner

join using Merge function i am getting an empty result.加入使用 Merge function 我得到一个空的结果。

 DF1 COURSE_KEY CORP_ID 2.97E+11 23096 2.97E+11 23097 2.97E+11 10987 2.97E+11 560989 2.97E+11 34678 DF2 COURSE_KEY COURSE_UNIQUE_KEY COURSE_ID CERTIFICATION_ID 2.97E+11 4077 WW_13456 WFT-CK-027 2.97E+11 5789 ww_13456 NL-WFT-12121

df3 = pd.merge(Df1,Df2, on='COURSE_KEY' how='inner') df3 = pd.merge(Df1,Df2, on='COURSE_KEY' how='inner')

Merging on float values may not be the best option as float values are theoretically infinite.合并浮点值可能不是最佳选择,因为浮点值理论上是无限的。

For practical usage we can use int:对于实际使用,我们可以使用 int:

n = 100
df1['COURSE_KEY'] = np.round(df1.COURSE_KEY*n).astype(int) 
df2['COURSE_KEY'] = np.round(df2.COURSE_KEY*n).astype(int)

df = pd.merge(df1, df2, how = 'inner', on = 'COURSE_KEY')
df['COURSE_KEY'] = df.COURSE_KEY/n
df

Output Output

    COURSE_KEY  CORP_ID COURSE_UNIQUE_KEY   COURSE_ID   CERTIFICATION_ID
0   2.970000e+11    23096   4077    WW_13456    WFT-CK-027
1   2.970000e+11    23096   5789    ww_13456    NL-WFT-12121
2   2.970000e+11    23097   4077    WW_13456    WFT-CK-027
3   2.970000e+11    23097   5789    ww_13456    NL-WFT-12121
4   2.970000e+11    10987   4077    WW_13456    WFT-CK-027
5   2.970000e+11    10987   5789    ww_13456    NL-WFT-12121
6   2.970000e+11    560989  4077    WW_13456    WFT-CK-027
7   2.970000e+11    560989  5789    ww_13456    NL-WFT-12121
8   2.970000e+11    34678   4077    WW_13456    WFT-CK-027
9   2.970000e+11    34678   5789    ww_13456    NL-WFT-12121

It is not exactly clear what the exact issue might be, but a good guess is perhaps that there are differences in the float values.目前尚不清楚确切的问题可能是什么,但一个很好的猜测可能是浮点值存在差异。 Float comparisons should be done by checking closeness and not sameness because there are often decimal variations.浮点比较应该通过检查接近性而不是相同性来完成,因为通常存在小数变化。

Depending on the context, you might need to use a better key datatype, say an int or a string, or check for closeness, say by doing something [this answer][1] does.根据上下文,您可能需要使用更好的键数据类型,例如 int 或字符串,或检查紧密度,例如通过执行[this answer][1]执行的操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM