简体   繁体   English

如何让 numpy.where() 只返回满足条件的元素?

[英]How to let numpy.where() return only the elements satisfying the condition?

I want to loop through a Pandas DataFrame using numpy.where() and get a list that contains only the elements that satisfy the condition.我想使用 numpy.where() 遍历 Pandas DataFrame 并获取仅包含满足条件的元素的列表。

For example, lets say I have the following pandas DataFrame:例如,假设我有以下 pandas DataFrame:

df = pd.DataFrame({"A": [1, 2, 3, 5, 3, 7, 3],
                   "B": [0, 1, 6, 4, 9, 8, 2],
                   "id": [0, 1, 2, 3, 4, 5, 6]
                  })

I would like to return a list of those id values for which column A is equal to 3 and column B is greater than or equal to 5我想返回A列等于 3 且B列大于或等于 5 的那些id值的列表

I tried:我试过了:

ids = np.where((df["A"] == 3) & (df["B"] >= 5)), df["id"])

But that gives the following error:但这给出了以下错误:

ValueError: either both or neither of x and y should be given

I realise I could solve this by just returning some default value like -1 in the else part of the where and later remove all occurences of -1 from ids , but that's both ineffective for my huge Dataframe and does not appear to be the most elegant way.我意识到我可以通过在whereelse部分返回一些像 -1 这样的默认值来解决这个问题,然后从ids中删除所有出现的 -1 ,但这对于我巨大的 Dataframe 都无效并且似乎不是最优雅的方法。

How to solve this in the most efficient (least time consuming) way?如何以最有效(最省时)的方式解决这个问题? If a where is not the most efficient solution I'm open to other suggestions.如果where不是最有效的解决方案,我愿意接受其他建议。

You can do this within Pandas itself by using either boolean indexing or the query method on the dataframe.您可以使用 boolean 索引或 dataframe 上的查询方法在 Pandas 本身内执行此操作。

In [4]: import pandas as pd

In [5]: df = pd.DataFrame({"A": [1, 2, 3, 5, 3, 7, 3],
   ...:                    "B": [0, 1, 6, 4, 9, 8, 2],
   ...:                    "id": [0, 1, 2, 3, 4, 5, 6]
   ...:                   })

In [6]: df
Out[6]:
   A  B  id
0  1  0   0
1  2  1   1
2  3  6   2
3  5  4   3
4  3  9   4
5  7  8   5
6  3  2   6

In [7]: df[(df["A"] == 3) & (df["B"] >= 5)]['id'].to_list()
Out[7]: [2, 4]

In [8]: df.query("A == 3 and B >= 5")['id'].to_list()
Out[8]: [2, 4]

Use:利用:

In [1225]: df.loc[(df["A"] == 3) & (df["B"] >= 5), 'id'].to_numpy()
Out[1225]: array([2, 4])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM