简体   繁体   English

pySpark:如何在 dataframe 的 arrayType 列中获取 structType 中的所有元素名称?

[英]pySpark: How can I get all element names in structType in arrayType column in a dataframe?

I have a dataframe that looks something like this:我有一个看起来像这样的 dataframe:

 |-- name: string (nullable = true)
 |-- age: string (nullable = true)
 |-- job: string (nullable = true)
 |-- hobbies: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- favorite: string (nullable = true)
 |    |    |-- non-favorite: string (nullable = true)

And I'm trying to get this information:我正在尝试获取这些信息:

['favorite', 'non-favorite']

However, the only closest solution I found was using the explode function with withColumn , but it was based on the assumption that I already know the names of the elements.但是,我发现的唯一最接近的解决方案是使用带有withColumn的 explode function ,但它是基于我已经知道元素名称的假设。 But What I want to do is, without knowing the element names, I want to get the element names only with the column name, in this case 'hobbies'.但我想做的是,在不知道元素名称的情况下,我只想用列名获取元素名称,在本例中为“爱好”。 Is there a good way to get all the element names in any given column?有没有一种好方法可以获取任何给定列中的所有元素名称?

For a given dataframe with this schema:对于具有此架构的给定 dataframe:

df.printSchema()

root
 |-- hobbies: array (nullable = false)
 |    |-- element: struct (containsNull = false)
 |    |    |-- favorite: string (nullable = false)
 |    |    |-- non-favorite: string (nullable = false)

You can select the field names of the struct as:您可以 select 结构的字段名称为:

struct_fields = df.schema['hobbies'].dataType.elementType.fieldNames()

# output: ['favorite', 'non-favorite']

pyspark.sql.types.StructType.fieldnames should get you what you want. pyspark.sql.types.StructType.fieldnames应该可以满足您的需求。

 fieldNames() Returns all field names in a list. >>> struct = StructType([StructField("f1", StringType(), True)]) >>> struct.fieldNames() ['f1']

So in your case something like所以在你的情况下

dataframe.hobbies.getItem(0).fieldnames()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pyspark 将 StructType 转换为 ArrayType<StructType> - Pyspark Cast StructType as ArrayType<StructType> 如何在 Pyspark 数据框中连接 2 列轴 = 1 上的 ArrayType? - How to Concat 2 column of ArrayType on axis = 1 in Pyspark dataframe? 在pyspark中使用arraytype列创建数据框 - Create dataframe with arraytype column in pyspark 如何降低 PySpark 中 ArrayType 或 MapType 列中元素名称的大小写? - How to lower the case of element names in ArrayType or MapType columns in PySpark? 如何在pyspark中将StringType列与ArrayType列的每个元素连接起来 - How to concat a StringType column with every element of an ArrayType column in pyspark 将带有 StringType 列表的 PySpark DataFrame 列转换为 ArrayType - Convert PySpark DataFrame column with list in StringType to ArrayType 从PySpark DataFrame中删除所有StructType列 - Remove all StructType columns from PySpark DataFrame 从一个 PySpark 数据框中获取 ArrayType 列并在另一个数据框中获取相应的值 - Take ArrayType column from one PySpark dataframe and get corresponding value in another dataframe 如何在PySpark DataFrame中将ArrayType转换为DenseVector? - How to convert ArrayType to DenseVector in PySpark DataFrame? 如果 Pyspark dataframe 的列包含 NaN 值,我该如何获取? - How can I get if a column of a Pyspark dataframe contains NaN values?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM