简体   繁体   English

python dataframe pandas使用int删除列

[英]python dataframe pandas drop column using int

I understand that to drop a column you use df.drop('column name', axis=1).我知道要删除列,请使用 df.drop('column name', axis=1)。 Is there a way to drop a column using a numerical index instead of the column name?有没有办法使用数字索引而不是列名来删除列?

You can delete column on i index like this:您可以像这样删除i索引上的列:

df.drop(df.columns[i], axis=1)

It could work strange, if you have duplicate names in columns, so to do this you can rename column you want to delete column by new name.如果列中有重复的名称,这可能会很奇怪,因此为此您可以重命名要按新名称删除列的列。 Or you can reassign DataFrame like this:或者您可以像这样重新分配 DataFrame:

df = df.iloc[:, [j for j, c in enumerate(df.columns) if j != i]]

Drop multiple columns like this:像这样删除多个列:

cols = [1,2,4,5,12]
df.drop(df.columns[cols],axis=1,inplace=True)

inplace=True is used to make the changes in the dataframe itself without doing the column dropping on a copy of the data frame. inplace=True用于在数据帧本身中进行更改,而不会将列放在数据帧的副本上。 If you need to keep your original intact, use:如果您需要保持原件完好无损,请使用:

df_after_dropping = df.drop(df.columns[cols],axis=1)

If there are multiple columns with identical names, the solutions given here so far will remove all of the columns, which may not be what one is looking for.如果有多个具有相同名称的列,那么到目前为止给出的解决方案将删除所有列,这可能不是人们正在寻找的。 This may be the case if one is trying to remove duplicate columns except one instance.如果试图删除除一个实例之外的重复列,则可能会出现这种情况。 The example below clarifies this situation:下面的例子阐明了这种情况:

# make a df with duplicate columns 'x'
df = pd.DataFrame({'x': range(5) , 'x':range(5), 'y':range(6, 11)}, columns = ['x', 'x', 'y']) 


df
Out[495]: 
   x  x   y
0  0  0   6
1  1  1   7
2  2  2   8
3  3  3   9
4  4  4  10

# attempting to drop the first column according to the solution offered so far     
df.drop(df.columns[0], axis = 1) 
   y
0  6
1  7
2  8
3  9
4  10

As you can see, both Xs columns were dropped.如您所见,两个 Xs 列都被删除了。 Alternative solution:替代解决方案:

column_numbers = [x for x in range(df.shape[1])]  # list of columns' integer indices

column_numbers .remove(0) #removing column integer index 0
df.iloc[:, column_numbers] #return all columns except the 0th column

   x  y
0  0  6
1  1  7
2  2  8
3  3  9
4  4  10

As you can see, this truly removed only the 0th column (first 'x').如您所见,这确实仅删除了第 0 列(第一个“x”)。

You need to identify the columns based on their position in dataframe.您需要根据它们在数据框中的位置来识别列。 For example, if you want to drop (del) column number 2,3 and 5, it will be,例如,如果要删除 (del) 列号 2,3 和 5,它将是,

df.drop(df.columns[[2,3,5]], axis = 1)

If you have two columns with the same name.如果您有两个名称相同的列。 One simple way is to manually rename the columns like this:-一种简单的方法是像这样手动重命名列:-

df.columns = ['column1', 'column2', 'column3']

Then you can drop via column index as you requested, like this:-然后您可以根据要求通过列索引删除,如下所示:-

df.drop(df.columns[1], axis=1, inplace=True)

df.column[1] will drop index 1. df.column[1]将删除索引 1。

Remember axis 1 = columns and axis 0 = rows.记住轴 1 = 列和轴 0 = 行。

You can simply supply columns parameter to df.drop command so you don't to specify axis in that case, like so您可以简单地将columns参数提供给df.drop命令,这样您就不必在这种情况下指定axis ,就像这样

columns_list = [1, 2, 4] # index numbers of columns you want to delete
df = df.drop(columns=df.columns[columns_list])

For reference see columns parameter here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html?highlight=drop#pandas.DataFrame.drop如需参考,请参阅此处的columns参数: https : //pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html?highlight=drop#pandas.DataFrame.drop

if you really want to do it with integers (but why?), then you could build a dictionary.如果你真的想用整数来做(但为什么?),那么你可以建立一个字典。

col_dict = {x: col for x, col in enumerate(df.columns)}

then df = df.drop(col_dict[0], 1) will work as desired然后df = df.drop(col_dict[0], 1)将按需要工作

edit: you can put it in a function that does that for you, though this way it creates the dictionary every time you call it编辑:你可以把它放在一个为你做这件事的函数中,尽管这样它每次调用它时都会创建字典

def drop_col_n(df, col_n_to_drop):
    col_dict = {x: col for x, col in enumerate(df.columns)}
    return df.drop(col_dict[col_n_to_drop], 1)

df = drop_col_n(df, 2)

You can use the following line to drop the first two columns (or any column you don't need):您可以使用以下行删除前两列(或您不需要的任何列):

df.drop([df.columns[0], df.columns[1]], axis=1)

Reference 参考

Good way to get the columns you want (doesn't matter duplicate names).获取所需列的好方法(与重复名称无关)。

For example you have the column indices you want to drop contained in a list-like variable例如,您将要删除的列索引包含在类似列表的变量中

unnecessary_cols = [1, 4, 5, 6]

then那么

import numpy as np
df.iloc[:, np.setdiff1d(np.arange(len(df.columns)), unnecessary_cols)]

Since there can be multiple columns with same name , we should first rename the columns.由于可以有多个具有相同名称的列,我们应该首先重命名这些列。 Here is code for the solution.这是解决方案的代码。

df.columns=list(range(0,len(df.columns)))
df.drop(columns=[1,2])#drop second and third columns

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM