简体   繁体   English

删除 pandas dataframe 中的每 n 列

[英]Drop every nth column in pandas dataframe

i have a pandas dataframe where the columns are named like:我有一个 pandas dataframe 列的名称如下:

0,1,2,3,4,.....,n

i would like to drop every 3rd column so that i get a new dataframe where i would have the columns like:我想删除每第 3 列,以便我得到一个新的 dataframe ,其中我会有如下列:

0,1,3,4,6,7,9,.....,n

I have tried like this:我试过这样:

shape = df.shape[1]
for i in range(2,shape,3):
    df = df.drop(df.columns[i], axis=1) 

but i get an error saying index is out of bound and i assume this happens because the shape of the dataframe changes when i am dropping the columns.但我收到一个错误,说索引超出范围,我认为这是因为 dataframe 的形状在我删除列时发生了变化。 if i just don't store the output of the "for" loop, then the code works but i don't get my new dataframe.如果我只是不存储“for”循环的 output,那么代码可以工作,但我没有得到我的新 dataframe。

How do i solve this?我该如何解决这个问题? Thanks谢谢

Here is solution with inverted logic - select all columns with removed each 3rd column.这是具有反转逻辑的解决方案 - select 所有列都删除了每个第 3 列。

You can filter values by compare added 1 to helper array, with 3 modulo compare for not equal 0 and pass to DataFrame.loc :您可以通过比较添加1到辅助数组来过滤值,用 3 模比较不等于0并传递给DataFrame.loc

df = pd.DataFrame({
        'A':list('abcdef'),
         'B':[4,5,4,5,5,4],
         'C':[7,8,9,4,2,3],
         'D':[1,3,5,7,1,0],
         'E':[5,3,6,9,2,4],
         'F':list('aaabbb')
})

df = df.loc[:, (np.arange(len(df.columns)) + 1) % 3 != 0]
print (df)
   A  B  D  E
0  a  4  1  5
1  b  5  3  3
2  c  4  5  6
3  d  5  7  9
4  e  5  1  2
5  f  4  0  4

The issue with code is , each time you drop a column in your loop, you end up with a different set of columns because you overwrite the df back after each iteration.代码的问题是,每次在循环中删除一列时,最终都会得到一组不同的列,因为在每次迭代后都会覆盖df When you try to drop the next 3rd column of THAT new set of columns, you not only drop the wrong one, you end up running out of columns eventually.当您尝试删除该新列的下一个第三列时,您不仅会删除错误的列,而且最终会用完列。 That's why you get the error you are getting.这就是为什么你会得到你得到的错误。

iter1 -> 0,1,3,4,5,6,7,8,9,10 ... n #first you drop 2 which is 3rd col
iter2 -> 0,1,3,4,5,7,8,9,10 ... n   #next you drop 6 which is 6th col (should be 5)
iter3 -> 0,1,3,4,5,7,8,9, ... n     #next you drop 10 which is 9th col (should be 8)

What you want to do is calculate the indexes beforehand and then remove them in one go.您要做的是事先计算索引,然后将它们删除到一个 go 中。


You can simply just get the indexes of columns you want to remove with range and then drop those.您可以简单地获取要使用范围删除的列的索引,然后删除它们。

drop_idx = list(range(2,df.shape[1],3)) #Indexes to drop
df2 = df.drop(drop_idx, axis=1)         #Drop them at once over axis=1


print('old columns->', list(df.columns))
print('idx to drop->', drop_idx)
print('new columns->',list(df2.columns))
old columns-> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
idx to drop-> [2, 5, 8]
new columns-> [0, 1, 3, 4, 6, 7, 9]

Note: This works only because your columns names are same as indexes.注意:这仅是因为您的列名与索引相同。 If however, your column names are not like that, you will have to do an extra step of fetching the column names based on the index you want to drop.但是,如果您的列名不是这样,您将不得不根据要删除的索引执行额外的步骤来获取列名。

drop_idx = list(range(2,df.shape[1],3))
drop_cols = [j for i,j in enumerate(df.columns) if i in drop_idx] #<--
df2 = df.drop(drop_idx, axis=1)

You can use list comprehension to filter columns:您可以使用列表推导来过滤列:

df = df[[k for k in df.columns if (k + 1) % 3 != 0]]

If the names are different (eg strings) and you want to discard every 3rd column regardless of its name, then:如果名称不同(例如字符串)并且您想丢弃每第三列而不考虑其名称,那么:

df = df[[k for i, k in enumerate(df.columns, 1) if i % 3 != 0]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM