简体   繁体   English

pyspark - 从数据框中获取数组类型的值

[英]pyspark - get value of array type from dataframe

My data frame is like below.我的数据框如下所示。 I need to extract values from input Array Type column.我需要从输入数组类型列中提取值。 Could you let me know how can I achieve this in pyspark.你能告诉我如何在 pyspark 中实现这一点吗?

None
root
 |-- input: array (nullable = true)
 |    |-- element: map (containsNull = true)
 |    |    |-- key: string
 |    |    |-- value: map (valueContainsNull = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: double (valueContainsNull = true)
 |-- A: array (nullable = true)
 |    |-- element: map (containsNull = true)
 |    |    |-- key: string
 |    |    |-- value: map (valueContainsNull = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: double (valueContainsNull = true)
 |-- B: array (nullable = true)
 |    |-- element: map (containsNull = true)
 |    |    |-- key: string
 |    |    |-- value: map (valueContainsNull = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: double (valueContainsNull = true)
 |-- C: array (nullable = true)
 |    |-- element: map (containsNull = true)
 |    |    |-- key: string
 |    |    |-- value: map (valueContainsNull = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: double (valueContainsNull = true)
 |-- D: array (nullable = true)
 |    |-- element: map (containsNull = true)
 |    |    |-- key: string
 |    |    |-- value: map (valueContainsNull = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: double (valueContainsNull = true)
 |-- E: array (nullable = true)
 |    |-- element: map (containsNull = true)
 |    |    |-- key: string
 |    |    |-- value: map (valueContainsNull = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: double (valueContainsNull = true)
 |-- timestamp: array (nullable = true)
 |    |-- element: map (containsNull = true)
 |    |    |-- key: string
 |    |    |-- value: map (valueContainsNull = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: double (valueContainsNull = true)

Hope this helps!希望这可以帮助!

from itertools import chain
df.select('input').rdd.flatMap(lambda x: chain(*(x))).map(lambda x: x.values()).collect()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从pyspark数据框中获取平均日期值 - Get average date value from pyspark dataframe 如何从pyspark的数据框列中获取第一个值和最后一个值? - how to get first value and last value from dataframe column in pyspark? 如何使用 where 子句从 pyspark dataframe 中获取值 - How to get a value from one pyspark dataframe using where clause PySpark dataframe 转换 - 从 JSON 获取价值部分 - PySpark dataframe transformation - to get value part from JSON 从 PySpark 的 Dataframe 列中获取最后一个/分隔的值 - Get last / delimited value from Dataframe column in PySpark 如何更新pyspark中数据帧中结构数组中的值? - How to update a value in an array of structs in a dataframe in pyspark? 按类型从 pyspark dataframe 中删除行 - remove rows from pyspark dataframe by type 从一个 PySpark 数据框中获取 ArrayType 列并在另一个数据框中获取相应的值 - Take ArrayType column from one PySpark dataframe and get corresponding value in another dataframe Pyspark:将数据帧作为数组类型列连接到另一个数据帧 - Pyspark: join dataframe as an array type column to another dataframe 使用 PySpark dataframe 根据索引从一个数组中定位值并复制到另一个数组 - With PySpark dataframe locate value from one array based on index and copy to another array
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM