简体   繁体   中英

Pyspark: Select all columns except particular columns

I have a large number of columns in a PySpark dataframe, say 200. I want to select all the columns except say 3-4 of the columns. How do I select this columns without having to manually type the names of all the columns I want to select?

In the end, I settled for the following :

  • Drop :

    df.drop('column_1', 'column_2', 'column_3')

  • Select :

    df.select([c for c in df.columns if c not in {'column_1', 'column_2', 'column_3'}])

df.drop(*[cols for cols in [list of columns to drop]])

Useful if the list to drop columns is huge. or if the list can be derived programmatically.

I have a large number of columns in a PySpark dataframe, say 200. I want to select all the columns except say 3-4 of the columns. How do I select this columns without having to manually type the names of all the columns I want to select?

this might be helpful

df_cols = list(set(df.columns) - {'<col1>','<col2>',....})

df.select(df_cols).show()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM