将spark数据框列传递给geohash函数-pyspark。无法将列转换为布尔值：

Question

import pygeohash as pgh

pgh.encode(45,55)

'tpzpgxczbzur'

The above steps are working great. 上述步骤效果很好。 Below I'm trying to create a data frame: 下面我试图创建一个数据框：

l = [(45,25),(75,22),(85,20),(89,26)]

rdd = sc.parallelize(l)
geoCords = rdd.map(lambda x: Row(lat=x[0], long=int(x[1])))
geoCordsSchema = sqlContext.createDataFrame(geoCords)
geoCordsSchema.show()

+---+----+
|lat|long|
+---+----+
| 45|  25|
| 75|  22|
| 85|  20|
| 89|  26|
+---+----+

This successfully creates a spark dataframe. 这成功创建了一个火花数据框。 Now I'm using Pygeohash encode, and throwing error as below: 现在我正在使用Pygeohash编码，并抛出如下错误：

pgh.encode(geoCordsSchema.lat, geoCordsSchema.long, precision = 7)

Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/Library/Python/2.7/site-packages/pygeohash/geohash.py", line 96, in encode
   if longitude > mid:
   File "/usr/local/spark/python/pyspark/sql/column.py", line 427, in __nonzero__
   raise ValueError("Cannot convert column into bool: please use '&' for 'and', '|' for 'or', "
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.

Answer 1

you can't use column directly on some function to transform it. 您不能在某些函数上直接使用column来对其进行转换。 You can use UDF to achieve it, 您可以使用UDF实现它，

from pyspark.sql import function as F
udf1 = F.udf(lambda x,y: pgh.encode(x,y,precision=7))
geoCordsSchema.select('lat','long',udf1('lat','long').alias('encodedVal')).show()
+---+----+-----------+
|lat|long|encodedeVal|
+---+----+-----------+
| 45|  25|    sxczbzu|
| 75|  22|    umrdst7|
| 85|  20|    urn5x1g|
| 89|  26|    uxf6r9u|
+---+----+-----------+

将spark数据框列传递给geohash函数-pyspark。无法将列转换为布尔值：

问题描述

1 个解决方案

解决方案1
4 已采纳 2017-12-02 08:01:25

将spark数据框列传递给geohash函数-pyspark。 无法将列转换为布尔值：

问题描述

1 个解决方案

解决方案1 4 已采纳 2017-12-02 08:01:25

将spark数据框列传递给geohash函数-pyspark。无法将列转换为布尔值：

解决方案1
4 已采纳 2017-12-02 08:01:25