Pyspark: Select all columns except particular columns

Question

I have a large number of columns in a PySpark dataframe, say 200. I want to select all the columns except say 3-4 of the columns. How do I select this columns without having to manually type the names of all the columns I want to select?

Answer 1

In the end, I settled for the following :

Drop :
df.drop('column_1', 'column_2', 'column_3')
Select :
df.select([c for c in df.columns if c not in {'column_1', 'column_2', 'column_3'}])

Answer 2

df.drop(*[cols for cols in [list of columns to drop]])

Useful if the list to drop columns is huge. or if the list can be derived programmatically.

Answer 3

I have a large number of columns in a PySpark dataframe, say 200. I want to select all the columns except say 3-4 of the columns. How do I select this columns without having to manually type the names of all the columns I want to select?

Answer 4

this might be helpful

df_cols = list(set(df.columns) - {'<col1>','<col2>',....})

df.select(df_cols).show()

Pyspark: Select all columns except particular columns

Question

3 answers

solution1
30 ACCPTED 2018-09-04 07:05:44

solution2
1 2021-09-13 17:04:51

solution3
0 2021-05-18 13:38:15

solution4
0 2022-09-09 15:51:15

this might be helpful

Pyspark: Select all columns except particular columns

Question

3 answers

solution1 30 ACCPTED 2018-09-04 07:05:44

solution2 1 2021-09-13 17:04:51

solution3 0 2021-05-18 13:38:15

solution4 0 2022-09-09 15:51:15

this might be helpful

solution1
30 ACCPTED 2018-09-04 07:05:44

solution2
1 2021-09-13 17:04:51

solution3
0 2021-05-18 13:38:15

solution4
0 2022-09-09 15:51:15