[英]How to select all columns except one in pandas?
I have a dataframe that look like this:我有一个看起来像这样的数据框:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(4,4), columns=list('abcd'))
df
a b c d
0 0.418762 0.042369 0.869203 0.972314
1 0.991058 0.510228 0.594784 0.534366
2 0.407472 0.259811 0.396664 0.894202
3 0.726168 0.139531 0.324932 0.906575
How I can get all columns except b
?我怎样才能得到除
b
之外的所有列?
When the columns are not a MultiIndex, df.columns
is just an array of column names so you can do:当列不是 MultiIndex 时,
df.columns
只是一个列名数组,因此您可以执行以下操作:
df.loc[:, df.columns != 'b']
a c d
0 0.561196 0.013768 0.772827
1 0.882641 0.615396 0.075381
2 0.368824 0.651378 0.397203
3 0.788730 0.568099 0.869127
Don't use ix
.不要使用
ix
。 It's deprecated .它已弃用。 The most readable and idiomatic way of doing this is
df.drop()
:最易读和最惯用的方法是
df.drop()
:
>>> df
a b c d
0 0.175127 0.191051 0.382122 0.869242
1 0.414376 0.300502 0.554819 0.497524
2 0.142878 0.406830 0.314240 0.093132
3 0.337368 0.851783 0.933441 0.949598
>>> df.drop('b', axis=1)
a c d
0 0.175127 0.382122 0.869242
1 0.414376 0.554819 0.497524
2 0.142878 0.314240 0.093132
3 0.337368 0.933441 0.949598
Note that by default, .drop()
does not operate inplace;请注意,默认情况下,
.drop()
不会就地操作; despite the ominous name, df
is unharmed by this process.尽管名字不祥,
df
并没有受到这个过程的伤害。 If you want to permanently remove b
from df
, do df.drop('b', inplace=True)
.如果要从
df
中永久删除b
,请执行df.drop('b', inplace=True)
。
df.drop()
also accepts a list of labels, eg df.drop(['a', 'b'], axis=1)
will drop column a
and b
. df.drop()
还接受标签列表,例如df.drop(['a', 'b'], axis=1)
将删除列a
和b
。
df[df.columns.difference(['b'])]
Out:
a c d
0 0.427809 0.459807 0.333869
1 0.678031 0.668346 0.645951
2 0.996573 0.673730 0.314911
3 0.786942 0.719665 0.330833
You can use df.columns.isin()
您可以使用
df.columns.isin()
df.loc[:, ~df.columns.isin(['b'])]
When you want to drop multiple columns, as simple as:当你想删除多列时,很简单:
df.loc[:, ~df.columns.isin(['col1', 'col2'])]
Here is another way:这是另一种方式:
df[[i for i in list(df.columns) if i != '<your column>']]
You just pass all columns to be shown except of the one you do not want.您只需传递所有要显示的列,除了您不想要的列。
Here is a one line lambda:这是一个单行 lambda:
df[map(lambda x :x not in ['b'], list(df.columns))]
before :之前:
import pandas
import numpy as np
df = pd.DataFrame(np.random.rand(4,4), columns = list('abcd'))
df
a b c d
0 0.774951 0.079351 0.118437 0.735799
1 0.615547 0.203062 0.437672 0.912781
2 0.804140 0.708514 0.156943 0.104416
3 0.226051 0.641862 0.739839 0.434230
after :之后:
df[map(lambda x :x not in ['b'], list(df.columns))]
a c d
0 0.774951 0.118437 0.735799
1 0.615547 0.437672 0.912781
2 0.804140 0.156943 0.104416
3 0.226051 0.739839 0.434230
I think the best way to do is the way mentioned by @Salvador Dali.我认为最好的方法是@Salvador Dali 提到的方法。 Not that the others are wrong.
并不是说其他人错了。
Because when you have a data set where you just want to select one column and put it into one variable and the rest of the columns into another for comparison or computational purposes.因为当您有一个数据集时,您只想选择一列并将其放入一个变量中,而将其余列放入另一个变量中以进行比较或计算。 Then dropping the column of the data set might not help.
然后删除数据集的列可能无济于事。 Of course there are use cases for that as well.
当然,也有一些用例。
x_cols = [x for x in data.columns if x != 'name of column to be excluded']
Then you can put those collection of columns in variable x_cols
into another variable like x_cols1
for other computation.然后,您可以将变量
x_cols
中的这些列集合放入另一个变量(如x_cols1
中以进行其他计算。
ex: x_cols1 = data[x_cols]
Another slight modification to @Salvador Dali enables a list of columns to exclude:对@Salvador Dali 的另一项细微修改启用了要排除的列列表:
df[[i for i in list(df.columns) if i not in [list_of_columns_to_exclude]]]
or或者
df.loc[:,[i for i in list(df.columns) if i not in [list_of_columns_to_exclude]]]
我认为一个不错的解决方案是使用 pandas 和正则表达式的函数过滤器(匹配除“b”之外的所有内容):
df.filter(regex="^(?!b$)")
与@Toms 回答类似,也可以在不使用 .loc 的情况下选择除“b”以外的所有列,如下所示:
df[df.columns[~df.columns.isin(['b'])]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.