简体   繁体   English

将spark数据框列传递给geohash函数-pyspark。 无法将列转换为布尔值:

[英]Passing spark dataframe columns to geohash function - pyspark. Cannot convert column into bool:

import pygeohash as pgh

pgh.encode(45,55)

'tpzpgxczbzur'

The above steps are working great. 上述步骤效果很好。 Below I'm trying to create a data frame: 下面我试图创建一个数据框:

l = [(45,25),(75,22),(85,20),(89,26)]

rdd = sc.parallelize(l)
geoCords = rdd.map(lambda x: Row(lat=x[0], long=int(x[1])))
geoCordsSchema = sqlContext.createDataFrame(geoCords)
geoCordsSchema.show()

+---+----+
|lat|long|
+---+----+
| 45|  25|
| 75|  22|
| 85|  20|
| 89|  26|
+---+----+

This successfully creates a spark dataframe. 这成功创建了一个火花数据框。 Now I'm using Pygeohash encode, and throwing error as below: 现在我正在使用Pygeohash编码,并抛出如下错误:

pgh.encode(geoCordsSchema.lat, geoCordsSchema.long, precision = 7)

Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "/Library/Python/2.7/site-packages/pygeohash/geohash.py", line 96, in encode
   if longitude > mid:
   File "/usr/local/spark/python/pyspark/sql/column.py", line 427, in __nonzero__
   raise ValueError("Cannot convert column into bool: please use '&' for 'and', '|' for 'or', "
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.

you can't use column directly on some function to transform it. 您不能在某些函数上直接使用column来对其进行转换。 You can use UDF to achieve it, 您可以使用UDF实现它,

from pyspark.sql import function as F
udf1 = F.udf(lambda x,y: pgh.encode(x,y,precision=7))
geoCordsSchema.select('lat','long',udf1('lat','long').alias('encodedVal')).show()
+---+----+-----------+
|lat|long|encodedeVal|
+---+----+-----------+
| 45|  25|    sxczbzu|
| 75|  22|    umrdst7|
| 85|  20|    urn5x1g|
| 89|  26|    uxf6r9u|
+---+----+-----------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 PySpark 值错误无法将列转换为布尔值 - PySpark Value Error Cannot convert column into bool PySpark。 将数据框传递给pandas_udf并返回一系列 - PySpark. Passing a Dataframe to a pandas_udf and returning a series Pyspark. Create A new column based on 多列多行 - Pyspark. Create A new column based on multiple columns and multiple rows 火花。 多数据帧操作 - PySpark. Multi-dataframe operations 在 PySpark 中使用通配符列名称将 Spark 数据帧列转为行 - Pivot Spark Dataframe Columns to Rows with Wildcard column Names in PySpark 希望将字符串列转换为 PySpark 中的 Integer 列。 无法转换的字符串会怎样? - Looking to convert String Column to Integer Column in PySpark. What happens to strings that can't be converted? 无法将 pandas 转换为 pyspark。 它显示数据框没有百分位数 - Unable to convert the pandas to pyspark. It's showing there's dataframe has no percentile 使用udf传递列作为参数将自定义列添加到pyspark数据帧 - Adding a custom column to a pyspark dataframe using udf passing columns as an argument 转换火花dataframe柱 - Convert spark dataframe column 基于数据块上另一个 pyspark 数据帧的某些列,在大型 pyspark 数据帧的列上执行用户定义的函数 - Perform a user defined function on a column of a large pyspark dataframe based on some columns of another pyspark dataframe on databricks
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM