[英]Passing spark dataframe columns to geohash function - pyspark. Cannot convert column into bool:
import pygeohash as pgh
pgh.encode(45,55)
'tpzpgxczbzur'
The above steps are working great. 上述步骤效果很好。 Below I'm trying to create a data frame: 下面我试图创建一个数据框:
l = [(45,25),(75,22),(85,20),(89,26)]
rdd = sc.parallelize(l)
geoCords = rdd.map(lambda x: Row(lat=x[0], long=int(x[1])))
geoCordsSchema = sqlContext.createDataFrame(geoCords)
geoCordsSchema.show()
+---+----+
|lat|long|
+---+----+
| 45| 25|
| 75| 22|
| 85| 20|
| 89| 26|
+---+----+
This successfully creates a spark dataframe. 这成功创建了一个火花数据框。 Now I'm using Pygeohash encode, and throwing error as below: 现在我正在使用Pygeohash编码,并抛出如下错误:
pgh.encode(geoCordsSchema.lat, geoCordsSchema.long, precision = 7)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/pygeohash/geohash.py", line 96, in encode
if longitude > mid:
File "/usr/local/spark/python/pyspark/sql/column.py", line 427, in __nonzero__
raise ValueError("Cannot convert column into bool: please use '&' for 'and', '|' for 'or', "
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
you can't use column directly on some function to transform it. 您不能在某些函数上直接使用column来对其进行转换。 You can use UDF to achieve it, 您可以使用UDF实现它,
from pyspark.sql import function as F
udf1 = F.udf(lambda x,y: pgh.encode(x,y,precision=7))
geoCordsSchema.select('lat','long',udf1('lat','long').alias('encodedVal')).show()
+---+----+-----------+
|lat|long|encodedeVal|
+---+----+-----------+
| 45| 25| sxczbzu|
| 75| 22| umrdst7|
| 85| 20| urn5x1g|
| 89| 26| uxf6r9u|
+---+----+-----------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.