[英]pySpark: How can I get all element names in structType in arrayType column in a dataframe?
I have a dataframe that looks something like this:我有一个看起来像这样的 dataframe:
|-- name: string (nullable = true)
|-- age: string (nullable = true)
|-- job: string (nullable = true)
|-- hobbies: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- favorite: string (nullable = true)
| | |-- non-favorite: string (nullable = true)
And I'm trying to get this information:我正在尝试获取这些信息:
['favorite', 'non-favorite']
However, the only closest solution I found was using the explode function with withColumn
, but it was based on the assumption that I already know the names of the elements.但是,我发现的唯一最接近的解决方案是使用带有withColumn
的 explode function ,但它是基于我已经知道元素名称的假设。 But What I want to do is, without knowing the element names, I want to get the element names only with the column name, in this case 'hobbies'.但我想做的是,在不知道元素名称的情况下,我只想用列名获取元素名称,在本例中为“爱好”。 Is there a good way to get all the element names in any given column?有没有一种好方法可以获取任何给定列中的所有元素名称?
For a given dataframe with this schema:对于具有此架构的给定 dataframe:
df.printSchema()
root
|-- hobbies: array (nullable = false)
| |-- element: struct (containsNull = false)
| | |-- favorite: string (nullable = false)
| | |-- non-favorite: string (nullable = false)
You can select the field names of the struct as:您可以 select 结构的字段名称为:
struct_fields = df.schema['hobbies'].dataType.elementType.fieldNames()
# output: ['favorite', 'non-favorite']
pyspark.sql.types.StructType.fieldnames
should get you what you want. pyspark.sql.types.StructType.fieldnames
应该可以满足您的需求。
fieldNames() Returns all field names in a list. >>> struct = StructType([StructField("f1", StringType(), True)]) >>> struct.fieldNames() ['f1']
So in your case something like所以在你的情况下
dataframe.hobbies.getItem(0).fieldnames()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.