[英]Pandas filter columns of a DataFrame with bool
For a DataFrame (df) with multiple columns and rows 对于具有多个列和行的DataFrame(df)
A B C D
0 1 4 2 6
1 2 5 7 4
2 3 6 5 6
and another DataFrame (dfBool) containing dtype: bool 和另一个包含dtype:Bool的DataFrame(dfBool)
0 True
1 False
2 False
3 True
What is the easiest way to split this DataFrame by columns into two different DataFrames by transposing dfbool so you get the desired output 通过转置dfbool将此DataFrame按列拆分为两个不同的DataFrame的最简单方法是什么,以便获得所需的输出
A D
0 1 6
1 2 4
2 3 6
B C
0 4 2
1 5 7
2 6 5
I cannot understand, in my limited experience why dfTrue = df[dfBool.transpose() == True]
does not work 我无法理解,在我有限的经验中为什么dfTrue = df[dfBool.transpose() == True]
不起作用
I would like to modify EdChum's comment , because if dfBool
is DataFrame
, you have to first select column
: 我想修改EdChum的评论 ,因为如果dfBool
是DataFrame
,你必须先选择column
:
import pandas as pd
df = pd.DataFrame({'D': {0: 6, 1: 4, 2: 6},
'A': {0: 1, 1: 2, 2: 3},
'C': {0: 2, 1: 7, 2: 5},
'B': {0: 4, 1: 5, 2: 6}})
print (df)
A B C D
0 1 4 2 6
1 2 5 7 4
2 3 6 5 6
dfBool = pd.DataFrame({'a':[True, False, False, True]})
print (dfBool)
a
0 True
1 False
2 False
3 True
#select first column in dfBool
df2 = (dfBool.iloc[:,0])
#or select column a in dfBool
#df2 = (dfBool.a)
print (df2)
0 True
1 False
2 False
3 True
Name: a, dtype: bool
print (df[df.columns[df2]])
A D
0 1 6
1 2 4
2 3 6
print (df[df.columns[~df2]])
B C
0 4 2
1 5 7
2 6 5
Another very nice solution from ayhan , thank you: 来自ayhan的另一个非常好的解决方案,谢谢:
print (df.loc[:, dfBool.a.values])
A D
0 1 6
1 2 4
2 3 6
print (df.loc[:, ~dfBool.a.values])
B C
0 4 2
1 5 7
2 6 5
But if dfBool
is Series
, solution works very well: 但如果dfBool
是Series
,解决方案效果很好:
dfBool = pd.Series([True, False, False, True])
print (dfBool)
0 True
1 False
2 False
3 True
dtype: bool
print (df[df.columns[dfBool]])
A D
0 1 6
1 2 4
2 3 6
print (df[df.columns[~dfBool]])
B C
0 4 2
1 5 7
2 6 5
And for Series
: 对于Series
:
print (df.loc[:, dfBool.values])
A D
0 1 6
1 2 4
2 3 6
print (df.loc[:, ~dfBool.values])
B C
0 4 2
1 5 7
2 6 5
Timings : 时间 :
In [277]: %timeit (df[df.columns[dfBool.a]])
1000 loops, best of 3: 769 µs per loop
In [278]: %timeit (df.loc[:, dfBool1.a.values])
The slowest run took 9.08 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 380 µs per loop
In [279]: %timeit (df.transpose()[dfBool1.a.values].transpose())
The slowest run took 5.04 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 550 µs per loop
Code for timings : 时间代码 :
import pandas as pd
df = pd.DataFrame({'D': {0: 6, 1: 4, 2: 6},
'A': {0: 1, 1: 2, 2: 3},
'C': {0: 2, 1: 7, 2: 5},
'B': {0: 4, 1: 5, 2: 6}})
print (df)
df = pd.concat([df]*1000, axis=1).reset_index(drop=True)
dfBool = pd.DataFrame({'a': [True, False, False, True]})
dfBool1 = pd.concat([dfBool]*1000).reset_index(drop=True)
Output is little different: 输出略有不同:
print (df[df.columns[dfBool.a]])
A A A A A A A A A A ... D D D D D D D D D D
0 1 1 1 1 1 1 1 1 1 1 ... 6 6 6 6 6 6 6 6 6 6
1 2 2 2 2 2 2 2 2 2 2 ... 4 4 4 4 4 4 4 4 4 4
2 3 3 3 3 3 3 3 3 3 3 ... 6 6 6 6 6 6 6 6 6 6
[3 rows x 2000 columns]
print (df.loc[:, dfBool1.a.values])
A D A D A D A D A D ... A D A D A D A D A D
0 1 6 1 6 1 6 1 6 1 6 ... 1 6 1 6 1 6 1 6 1 6
1 2 4 2 4 2 4 2 4 2 4 ... 2 4 2 4 2 4 2 4 2 4
2 3 6 3 6 3 6 3 6 3 6 ... 3 6 3 6 3 6 3 6 3 6
[3 rows x 2000 columns]
print (df.transpose()[dfBool1.a.values].transpose())
A D A D A D A D A D ... A D A D A D A D A D
0 1 6 1 6 1 6 1 6 1 6 ... 1 6 1 6 1 6 1 6 1 6
1 2 4 2 4 2 4 2 4 2 4 ... 2 4 2 4 2 4 2 4 2 4
2 3 6 3 6 3 6 3 6 3 6 ... 3 6 3 6 3 6 3 6 3 6
[3 rows x 2000 columns]
Maybe something like the following ? 也许像下面这样的东西?
import pandas as pd
totalDF = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [2, 7, 5], 'D': [6, 4, 8]})
dfBool = pd.DataFrame(data=[True, False, False, True])
totalDF.transpose()[dfBool.values].transpose()
A D
0 1 6
1 2 4
2 3 8
totalDF.transpose()[~dfBool.values].transpose()
B C
0 4 2
1 5 7
2 6 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.