简体   繁体   中英

How to use multiple columns in filter and lambda functions pyspark

I have a dataframe, in which I want to delete columns whose name starts with "test","id_1","vehicle" and so on

I use below code to delete one column

df1.drop(*filter(lambda col: 'test' in col, df.columns))

how to specify all columns at once in this line? this doesnt work:

df1.drop(*filter(lambda col: 'test','id_1' in col, df.columns))

You do something like the following:

expression = lambda col: all([col.startswith(i) for i in ['test', 'id_1', 'vehicle']])
df1.drop(*filter(lambda col:  expression(col), df.columns))

In PySpark version 2.1.0, it is possible to drop multiple columns using drop by providing a list of strings (with the names of the columns you want to drop) as argument to drop . (See documentation http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html?highlight=drop#pyspark.sql.DataFrame.drop ).

In your case, you may create a list containing the names of the columns you want to drop. For example:

cols_to_drop = [x for x in colunas if (x.startswith('test') or x.startswith('id_1') or x.startswith('vehicle'))]

And then apply the drop unpacking the list:

df1.drop(*cols_to_drop)

Ultimately, it is also possible to achieve a similar result by using select . For example:

# Define columns you want to keep
cols_to_keep = [x for x in df.columns if x not in cols_to_drop]

# create new dataframe, df2, that keeps only the desired columns from df1
df2 = df1.select(cols_to_keep)

Note that, by using select you don't need to unpack the list.

Please note that this question also address similar issue.

I hope this helps.

Well, it seems you can use regular column filter as following:

val forColumns = df.columns.filter(x => (x.startsWith("test") || x.startsWith("id_1") || x.startsWith("vehicle"))) ++ ["c_007"]

df.drop(*forColumns)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM