简体   繁体   English

Pandas过滤器DataFrame带系列

[英]Pandas filter DataFrame with Series

I have a pandas Series with the following content.我有一个pandas系列,内容如下。

$ import pandas as pd
$ filter = pd.Series(
    data = [True, False, True, True],
    index = ['A', 'B', 'C', 'D']
    )
$ filter.index.name = 'my_id'

$ print(filter)

my_id
A     True
B    False
C     True
D     True
dtype: bool

and a DataFrame like this.和这样的 DataFrame。

$ df = pd.DataFrame({
    'A': [1, 2, 9, 4],
    'B': [9, 6, 7, 8],
    'C': [10, 91, 32, 13],
    'D': [43, 12, 7, 9],
    'E': [65, 12, 3, 8]
})

$ print(df)

   A  B   C   D   E
0  1  9  10  43  65
1  2  6  91  12  12
2  9  7  32   7   3
3  4  8  13   9   8

filter has A , B , C , and D as its indices. filterABCD作为其索引。 df has A , B , C , D , and E as it column names. df具有ABCDE作为列名。

True in filter means that the corresponding column in df will be preserved. filter中的True意味着将保留df中的相应列。 False in filter means that the corresponding column in df will be removed. filter中的False意味着将删除df中的相应列。 Column E in df should be removed because filter doesn't contain E .应删除df中的E列,因为filter不包含E

How can I generate another DataFrame with column B , and E removed using filter ?如何生成另一个 DataFrame 列BE使用filter删除?

I mean I want to create the following DataFrame using filter and df .我的意思是我想使用filterdf创建以下 DataFrame 。

   A   C   D
0  1  10  43
1  2  91  12
2  9  32   7
3  4  13   9

df.loc[:, filter] generates the following error. df.loc[:, filter]生成以下错误。

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/username/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1494, in __getitem__
    return self._getitem_tuple(key)
  File "/Users/username/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 888, in _getitem_tuple
    retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
  File "/Users/username/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1869, in _getitem_axis
    return self._getbool_axis(key, axis=axis)
  File "/Users/username/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1515, in _getbool_axis
    key = check_bool_indexer(labels, key)
  File "/Users/username/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 2486, in check_bool_indexer
    raise IndexingError('Unalignable boolean Series provided as '
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

df.loc[:, filter] works if df doesn't contain column E .如果df不包含列E ,则df.loc[:, filter]有效。

The real length of the DataFrame ( len(df.columns) ) I encountered in my case contains about 2000 columns.我遇到的 DataFrame ( len(df.columns) ) 的实际长度包含大约 2000 列。 And the length of the Series ( len(filter) ) is about 1999. This makes me difficult to determine which elements are in df but not in filter . Series 的长度( len(filter) )大约是 1999 年。这让我很难确定哪些元素在df中但不在filter中。

This should give you what you need:这应该给你你需要的东西:

df.loc[:, filter[filter].index]

Explanation: You select the rows in filter which contain True and take their index labels to pick the columns from df .说明:您 select filter中包含True的行并获取它们的index标签以从df中选择列。

You cannot use the boolean values in filter directly because it contains fewer values than there are columns in df .您不能直接在filter中使用 boolean 值,因为它包含的值少于df中的列。

You don't need loc:你不需要 loc:

df_filtered=df[filter.index[filter]]
print(df_filtered)

   A   C   D
0  1  10  43
1  2  91  12
2  9  32   7
3  4  13   9

print(filter.index[filter])
#Index(['A', 'C', 'D'], dtype='object', name='my_id')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM