[英]Pandas filter DataFrame with Series
I have a pandas Series with the following content.我有一个pandas系列,内容如下。
$ import pandas as pd
$ filter = pd.Series(
data = [True, False, True, True],
index = ['A', 'B', 'C', 'D']
)
$ filter.index.name = 'my_id'
$ print(filter)
my_id
A True
B False
C True
D True
dtype: bool
and a DataFrame like this.和这样的 DataFrame。
$ df = pd.DataFrame({
'A': [1, 2, 9, 4],
'B': [9, 6, 7, 8],
'C': [10, 91, 32, 13],
'D': [43, 12, 7, 9],
'E': [65, 12, 3, 8]
})
$ print(df)
A B C D E
0 1 9 10 43 65
1 2 6 91 12 12
2 9 7 32 7 3
3 4 8 13 9 8
filter
has A
, B
, C
, and D
as its indices. filter
有A
、 B
、 C
和D
作为其索引。 df
has A
, B
, C
, D
, and E
as it column names. df
具有A
、 B
、 C
、 D
和E
作为列名。
True
in filter
means that the corresponding column in df
will be preserved. filter
中的True
意味着将保留df
中的相应列。 False
in filter
means that the corresponding column in df
will be removed. filter
中的False
意味着将删除df
中的相应列。 Column E
in df
should be removed because filter
doesn't contain E
.应删除df
中的E
列,因为filter
不包含E
。
How can I generate another DataFrame with column B
, and E
removed using filter
?如何生成另一个 DataFrame 列B
和E
使用filter
删除?
I mean I want to create the following DataFrame using filter
and df
.我的意思是我想使用filter
和df
创建以下 DataFrame 。
A C D
0 1 10 43
1 2 91 12
2 9 32 7
3 4 13 9
df.loc[:, filter]
generates the following error. df.loc[:, filter]
生成以下错误。
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/username/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1494, in __getitem__
return self._getitem_tuple(key)
File "/Users/username/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 888, in _getitem_tuple
retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
File "/Users/username/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1869, in _getitem_axis
return self._getbool_axis(key, axis=axis)
File "/Users/username/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1515, in _getbool_axis
key = check_bool_indexer(labels, key)
File "/Users/username/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 2486, in check_bool_indexer
raise IndexingError('Unalignable boolean Series provided as '
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match
df.loc[:, filter]
works if df
doesn't contain column E
.如果df
不包含列E
,则df.loc[:, filter]
有效。
The real length of the DataFrame ( len(df.columns)
) I encountered in my case contains about 2000 columns.我遇到的 DataFrame ( len(df.columns)
) 的实际长度包含大约 2000 列。 And the length of the Series ( len(filter)
) is about 1999. This makes me difficult to determine which elements are in df
but not in filter
. Series 的长度( len(filter)
)大约是 1999 年。这让我很难确定哪些元素在df
中但不在filter
中。
This should give you what you need:这应该给你你需要的东西:
df.loc[:, filter[filter].index]
Explanation: You select the rows in filter
which contain True
and take their index
labels to pick the columns from df
.说明:您 select filter
中包含True
的行并获取它们的index
标签以从df
中选择列。
You cannot use the boolean values in filter
directly because it contains fewer values than there are columns in df
.您不能直接在filter
中使用 boolean 值,因为它包含的值少于df
中的列。
You don't need loc:你不需要 loc:
df_filtered=df[filter.index[filter]]
print(df_filtered)
A C D
0 1 10 43
1 2 91 12
2 9 32 7
3 4 13 9
print(filter.index[filter])
#Index(['A', 'C', 'D'], dtype='object', name='my_id')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.