简体   繁体   English

Spark DF 枢轴错误:方法枢轴([类 java.lang.String,类 java.lang.String])不存在

[英]Spark DF pivot error: Method pivot([class java.lang.String, class java.lang.String]) does not exist

I am a newbie at using Spark dataframes.我是使用 Spark 数据帧的新手。 I am trying to use the pivot method with Spark (Spark version 2.x) and running into the following error:我正在尝试对 Spark(Spark 版本 2.x)使用pivot方法并遇到以下错误:

Py4JError: An error occurred while calling o387.pivot. Py4JError:调用 o387.pivot 时出错。 Trace: py4j.Py4JException: Method pivot([class java.lang.String, class java.lang.String]) does not exist跟踪:py4j.Py4JException: Method pivot([class java.lang.String, class java.lang.String]) 不存在

Even though I have the agg function as first here, I really do not need to apply any aggregation.尽管我在这里first使用agg函数,但我真的不需要应用任何聚合。

My dataframe looks like this:我的数据框如下所示:

+-----+-----+----------+-----+
| name|value|      date| time|
+-----+-----+----------+-----+
|name1|100.0|2017-12-01|00:00|
|name1|255.5|2017-12-01|00:15|
|name1|333.3|2017-12-01|00:30|

Expected:预期的:

+-----+----------+-----+-----+-----+
| name|      date|00:00|00:15|00:30|
+-----+----------+-----+-----+-----+
|name1|2017-12-01|100.0|255.5|333.3|

The way I am trying:我正在尝试的方式:

df = df.groupBy(["name","date"]).pivot(pivot_col="time",values="value").agg(first("value")).show

What is my mistake here?我在这里有什么错误?

The problem is the values="value" parameter in the pivot function.问题在于pivot函数中的values="value"参数。 This should be used for a list of actual values to pivot on, not a column name.这应该用于要透视的实际值列表,而不是列名。 From the documentation :文档

values – List of values that will be translated to columns in the output DataFrame. values – 将被转换为输出 DataFrame 列的值列表。

and an example:和一个例子:

 df4.groupBy("year").pivot("course", ["dotNET", "Java"]).sum("earnings").collect() [Row(year=2012, dotNET=15000, Java=20000), Row(year=2013, dotNET=48000, Java=30000)]

For the example in the question values should be set to ["00:00","00:15", "00:30"] .对于问题中的示例, values应设置为["00:00","00:15", "00:30"] However, the values argument is often not necessary (but will make the pivot more efficient), so you can simply change to:但是, values参数通常不是必需的(但会使数据透视更有效),因此您可以简单地更改为:

df = df.groupBy(["name","date"]).pivot("time").agg(first("value"))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pyspark Py4J 错误使用 canopy :PythonAccumulatorV2([class java.lang.String, class java.lang.Integer, class java.lang.String]) 不存在 - pyspark Py4J error using canopy :PythonAccumulatorV2([class java.lang.String, class java.lang.Integer, class java.lang.String]) does not exist py4j.Py4JException:方法和([class java.lang.String])不存在 - py4j.Py4JException: Method and([class java.lang.String]) does not exist JPype1=0.7.0: TypeError: Unable to convert str ro java type class java.lang.String - JPype1=0.7.0: TypeError: Unable to convert str ro java type class java.lang.String elasticsearch 映射预期的 map 用于字段 [name] 上的属性 [fields],但得到了 class Z93F725A07423FE21C846Zlang4。 - elasticsearch mapping Expected map for property [fields] on field [name] but got a class java.lang.String 将 java.lang.string 转换为 PYthon 字符串/字典 - Converting java.lang.string to PYthon string/dictionary 使用Spark SQL时无法将B强制转换为java.lang.String - Getting B cannot be cast to java.lang.String when using Spark SQL java.lang.String对象[]的数据类型与值meta [Date]不对应 - The data type of java.lang.String object [] does not correspond to value meta [Date] 小鸭,int() 参数必须是一个字符串,一个类似字节的 object 或一个数字,而不是 'java.lang.String', - Duckling, int() argument must be a string, a bytes-like object or a number, not 'java.lang.String', Pyspark - withColumn + when with variable give "Method or([class java.lang.Boolean]) does not exist" - Pyspark - withColumn + when with variable give "Method or([class java.lang.Boolean]) does not exist" “无法将‘java.lang.String’类型的值转换为所需的‘java.util.List’。” 或者如何将 Python 列表转换为 Java.util 列表? - "Failed to convert value of type 'java.lang.String' to required 'java.util.List'." Or how to convert Python list to Java.util list?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM