简体   繁体   中英

How to drop multiple column names given in a list from Spark DataFrame?

I have a dynamic list which is created based on value of n.

n = 3
drop_lst = ['a' + str(i) for i in range(n)]
df.drop(drop_lst)

But the above is not working.

Note :

My use case requires a dynamic list.

If I just do the below without list it works

df.drop('a0','a1','a2')

How do I make drop function work with list?

Spark 2.2 doesn't seem to have this capability. Is there a way to make it work without using select() ?

您可以使用*运算符将列表的内容作为参数传递给drop()

df.drop(*drop_lst)

您可以将列名称作为逗号分隔列表,例如

df.drop("col1","col11","col21")

This is how drop specified number of consecutive columns in scala:

val ll = dfwide.schema.names.slice(1,5)
dfwide.drop(ll:_*).show

slice take two parameters star index and end index.

Use simple loop:

for c in drop_lst:
   df = df.drop(c)

You can use drop(*cols) 2 ways .

  1. df.drop('age').collect()
  2. df.drop(df.age).collect()

Check the official documentation DataFrame.drop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM