pySpark：如何在 dataframe 的 arrayType 列中获取 structType 中的所有元素名称？

Question

I have a dataframe that looks something like this:我有一个看起来像这样的 dataframe：

 |-- name: string (nullable = true)
 |-- age: string (nullable = true)
 |-- job: string (nullable = true)
 |-- hobbies: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- favorite: string (nullable = true)
 |    |    |-- non-favorite: string (nullable = true)

And I'm trying to get this information:我正在尝试获取这些信息：

['favorite', 'non-favorite']

However, the only closest solution I found was using the explode function with withColumn , but it was based on the assumption that I already know the names of the elements.但是，我发现的唯一最接近的解决方案是使用带有withColumn的 explode function ，但它是基于我已经知道元素名称的假设。 But What I want to do is, without knowing the element names, I want to get the element names only with the column name, in this case 'hobbies'.但我想做的是，在不知道元素名称的情况下，我只想用列名获取元素名称，在本例中为“爱好”。 Is there a good way to get all the element names in any given column?有没有一种好方法可以获取任何给定列中的所有元素名称？

Answer 1

For a given dataframe with this schema:对于具有此架构的给定 dataframe：

df.printSchema()

root
 |-- hobbies: array (nullable = false)
 |    |-- element: struct (containsNull = false)
 |    |    |-- favorite: string (nullable = false)
 |    |    |-- non-favorite: string (nullable = false)

You can select the field names of the struct as:您可以 select 结构的字段名称为：

struct_fields = df.schema['hobbies'].dataType.elementType.fieldNames()

# output: ['favorite', 'non-favorite']

Answer 2

pyspark.sql.types.StructType.fieldnames should get you what you want. pyspark.sql.types.StructType.fieldnames应该可以满足您的需求。

 fieldNames() Returns all field names in a list. >>> struct = StructType([StructField("f1", StringType(), True)]) >>> struct.fieldNames() ['f1']

So in your case something like所以在你的情况下

dataframe.hobbies.getItem(0).fieldnames()

pySpark：如何在 dataframe 的 arrayType 列中获取 structType 中的所有元素名称？

问题描述

2 个解决方案

解决方案1
3 已采纳 2021-04-08 07:53:22

解决方案2
0 2021-04-08 04:16:43

pySpark：如何在 dataframe 的 arrayType 列中获取 structType 中的所有元素名称？

问题描述

2 个解决方案

解决方案1 3 已采纳 2021-04-08 07:53:22

解决方案2 0 2021-04-08 04:16:43

解决方案1
3 已采纳 2021-04-08 07:53:22

解决方案2
0 2021-04-08 04:16:43