Pyspark：Select 除特定列外的所有列

Question

I have a large number of columns in a PySpark dataframe, say 200. I want to select all the columns except say 3-4 of the columns.我在 PySpark dataframe 中有大量列，比如 200。我想 select 除了说 3-4 列之外的所有列。 How do I select this columns without having to manually type the names of all the columns I want to select?我如何 select 这个列而不必手动键入我想要 select 的所有列的名称？

Answer 1

In the end, I settled for the following :最后，我解决了以下问题：

Drop : 掉落：
df.drop('column_1', 'column_2', 'column_3')
Select : 选择：
df.select([c for c in df.columns if c not in {'column_1', 'column_2', 'column_3'}])

Answer 2

df.drop(*[cols for cols in [list of columns to drop]])

Useful if the list to drop columns is huge.如果要删除列的列表很大，则很有用。 or if the list can be derived programmatically.或者该列表是否可以通过编程方式派生。

Answer 3

I have a large number of columns in a PySpark dataframe, say 200. I want to select all the columns except say 3-4 of the columns.我在 PySpark 数据框中有大量列，比如 200。我想选择除 3-4 列之外的所有列。 How do I select this columns without having to manually type the names of all the columns I want to select?如何选择此列而不必手动键入要选择的所有列的名称？

Answer 4

this might be helpful这可能会有所帮助

df_cols = list(set(df.columns) - {'<col1>','<col2>',....})

df.select(df_cols).show()

Pyspark：Select 除特定列外的所有列

问题描述

3 个解决方案

解决方案1
30 已采纳 2018-09-04 07:05:44

解决方案2
1 2021-09-13 17:04:51

解决方案3
0 2021-05-18 13:38:15

解决方案4
0 2022-09-09 15:51:15

this might be helpful这可能会有所帮助

Pyspark：Select 除特定列外的所有列

问题描述

3 个解决方案

解决方案1 30 已采纳 2018-09-04 07:05:44

解决方案2 1 2021-09-13 17:04:51

解决方案3 0 2021-05-18 13:38:15

解决方案4 0 2022-09-09 15:51:15

this might be helpful这可能会有所帮助

解决方案1
30 已采纳 2018-09-04 07:05:44

解决方案2
1 2021-09-13 17:04:51

解决方案3
0 2021-05-18 13:38:15

解决方案4
0 2022-09-09 15:51:15