如何处理pyspark数据框列

Question

I have a pyspark df with >4k columns without any labels/headers. 我有一个带有> 4k列的pyspark df，没有任何标签/标题。 Based on the column values I need apply specific operations on each columns. 基于列的值，我需要在每列上应用特定的操作。

I did the same using pandas but I don't want to use pandas and would like to apply the column wise transformation directly on spark dataframe. 我使用pandas进行了相同的操作，但是我不想使用pandas，而是希望将列明智的转换直接应用于spark数据框。 any idea as how can i apply column wise transformation if the df is having >4k columns without any label.also I don't want to apply transformations on specific df column index. 任何想法，如果df有> 4k列而没有任何标签，我如何应用列明智的转换。我也不想在特定的df列索引上应用转换。

Answer 1

According to the Spark documentation, a dataframe contains - unlike what you said - headers, much like a database table. 根据Spark文档，与您所说的不同，数据框包含标头，很像数据库表。

In any case, a simple for loop should do the trick: 无论如何，一个简单的for循环应该可以解决问题：

for column in spark_dataframe.columns:
    (do whatever you want to do with your columns)

如何处理pyspark数据框列

问题描述

1 个解决方案

解决方案1
0 2017-02-08 08:50:27

如何处理pyspark数据框列

问题描述

1 个解决方案

解决方案1 0 2017-02-08 08:50:27

解决方案1
0 2017-02-08 08:50:27