![](/img/trans.png)
[英]How can I set floating point precision to avoid floating point arithmetic error in Python?
[英]How can I specify floating point precision in apache spark?
有沒有一種方法可以為spark中的浮點數指定精度,最好是在將RDD寫入文件之前,以便在計算時不會丟失精度?
最低工作示例
sqlCtxt = HiveContext(sc)
fulldata = sqlCtxt.jsonFile(DATA_FILE)
fulldata.registerTempTable("fulldata")
newcpulists = sqlCtxt.sql('SELECT xxx FROM fulldata')
def reduceSumPerc(x,y):
#some reducefunction
def mapfunc(x):
#some map function
reducedresult = newcpulists.map(mapfunc).reduceByKey(reduceSumPerc)
# I want to reduce the precision just at this line, before writing to file.
reducedresult.coalesce(1, True).saveAsTextFile(RESULT_PATH)
這樣的操作不在Spark范圍內。 由於saveAsTextFile
只需對非unicode數據調用unicode
,而對unicode
調用.encode
,那么您要做的就是使用標准Python格式化工具手動格式化輸出字符串,例如:
rdd = sc.parallelize([("foo", 0.123123132), ("bar", 0.00000001)])
rdd.map(lambda x: "{0}, {1:0.2f}".format(*x)).saveAsTextFile(...)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.