简体   繁体   English

包含 pyspark SQL: TypeError: 'Column' object is not callable

[英]contains pyspark SQL: TypeError: 'Column' object is not callable

I'm using spark 2.0.1,我正在使用 spark 2.0.1,

 df.show()
+--------+------+---+-----+-----+----+
|Survived|Pclass|Sex|SibSp|Parch|Fare|
+--------+------+---+-----+-----+----+
|     0.0|   3.0|1.0|  1.0|  0.0| 7.3|
|     1.0|   1.0|0.0|  1.0|  0.0|71.3|
|     1.0|   3.0|0.0|  0.0|  0.0| 7.9|
|     1.0|   1.0|0.0|  1.0|  0.0|53.1|
|     0.0|   3.0|1.0|  0.0|  0.0| 8.1|
|     0.0|   3.0|1.0|  0.0|  0.0| 8.5|
|     0.0|   1.0|1.0|  0.0|  0.0|51.9|

I have a data frame and I want to add a new column to df using withColumn and value of new column is base on other column value.我有一个数据框,我想使用 withColumn 向 df 添加一个新列,新列的值基于其他列值。 I used something like this:我使用了这样的东西:

>>> dfnew = df.withColumn('AddCol' , when(df.Pclass.contains('3.0'),'three').otherwise('notthree'))

It is giving an error它给出了一个错误

TypeError: 'Column' object is not callable

can any help how to over come this error.可以帮助如何克服这个错误。

Its because you are trying to apply the function contains to the column.这是因为您正在尝试将函数contains应用于该列。 The function contains does not exist in pyspark.函数contains在 pyspark 中不存在。 You should try like . like应该试试。 Try this:试试这个:

import pyspark.sql.functions as F

df = df.withColumn("AddCol",F.when(F.col("Pclass").like("3"),"three").otherwise("notthree"))

Or if you just want it to be exactly the number 3 you should do:或者,如果您只是希望它恰好是数字3 ,您应该这样做:

import pyspark.sql.functions as F

# If the column Pclass is numeric
df = df.withColumn("AddCol",F.when(F.col("Pclass") == F.lit(3),"three").otherwise("notthree"))

# If the column Pclass is string
df = df.withColumn("AddCol",F.when(F.col("Pclass") == F.lit("3"),"three").otherwise("notthree"))

you should use df.col(colName) instead of df.colName你应该使用 df.col(colName) 而不是 df.colName

exemple using java 8 and spark 2.1:使用 java 8 和 spark 2.1 的例子:

df.show();

+--------+------+---+-----+-----+----+
|Survived|Pclass|Sex|SibSp|Parch|Fare|
+--------+------+---+-----+-----+----+
|       0|     3|  1|    1|    0|   3|
|       1|     1|  0|    1|    0|   2|
+--------+------+---+-----+-----+----+

df = df.withColumn("AddCol", when(df.col("Pclass").contains("3"),"three").otherwise("notthree"));

df.show();

+--------+------+---+-----+-----+----+--------+
|Survived|Pclass|Sex|SibSp|Parch|Fare|  AddCol|
+--------+------+---+-----+-----+----+--------+
|       0|     3|  1|    1|    0|   3|   three|
|       1|     1|  0|    1|    0|   2|notthree|
+--------+------+---+-----+-----+----+--------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 PySpark:TypeError:'列'对象不可调用 - PySpark: TypeError: 'Column' object is not callable 错误类型错误:“列”对象在结构的 pyspark 数据框中不可调用 - ERROR TypeError: 'Column' object is not callable in pyspark data frame of struct 创建数组列但出现错误:TypeError: 'list' 对象在 PySpark 中不可调用 - Creating array column but getting error : TypeError: 'list' object is not callable in PySpark Pyspark TypeError:在数据帧列上应用 UDF 时,“NoneType”对象不可调用 - Pyspark TypeError: 'NoneType' object is not callable when applying a UDF on dataframe column 'Column' 对象不能用 Regex 和 Pyspark 调用 - 'Column' object is not callable with Regex and Pyspark Pyspark UDF-TypeError:“模块”对象不可调用 - Pyspark UDF - TypeError: 'module' object is not callable PySpark:TypeError:“str”对象在数据帧操作中不可调用 - PySpark: TypeError: 'str' object is not callable in dataframe operations TypeError:“ JavaPackage”对象在PySpark,AWS Glue上不可调用 - TypeError: 'JavaPackage' object is not callable on PySpark, AWS Glue 执行 sum() 时,Pyspark“列”对象不可调用 - Pyspark 'column' object is not callable, when performing sum() Jupyter Notebook 中的 PySpark:'Column' object 不可调用 - PySpark in Jupyter Notebook: 'Column' object is not callable
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM