布爾列上的條件DataFrame過濾器？

Question

如果我有一個DataFrame：

| id     | attribute_1 | attribute_2 |
|--------|-------------|-------------|
| 123abc | TRUE        | TRUE        |
| 123abc | TRUE        | FALSE       |
| 456def | TRUE        | FALSE       |
| 789ghi | TRUE        | TRUE        |
| 789ghi | FALSE       | FALSE       |
| 789ghi | FALSE       | FALSE       |

如何應用groupby或某些等效的過濾器來計算DataFrame子集中的id元素的唯一數量，如下所示：

| id     | attribute_1 | attribute_2 |
|--------|-------------|-------------|
| 123abc | TRUE        | TRUE        |
| 123abc | TRUE        | FALSE       |

意思是，我想獲得id值的唯一數目，其中對於給定id所有實例 ， attribute_1 == True ，但attribute_2必須至少具有1 True 。

因此， 456def將不會包含在過濾器中，因為它對attribute_2至少沒有一個True 。

789ghi不會包含在過濾器中，因為它的所有attribute_1條目都不都是True 。

Answer 1

您需要groupby兩次，一次在“ attribute_1”上使用transform('all') ，第二次在“ attribute_2”上使用transform('any') 。

i = df[df.groupby('id').attribute_1.transform('all')]
j = i[i.groupby('id').attribute_2.transform('any')]

print (j)
       id  attribute_1  attribute_2
0  123abc         True         True
1  123abc         True        False

最后，要獲取滿足此條件的唯一ID，請調用nunique ：

print (j['id'].nunique())
1

當您的attribute_ *列為布爾值時，這最容易做到。 如果它們是字符串，請先修復它們：

df = df.replace({'TRUE': True, 'FALSE': False})

布爾列上的條件DataFrame過濾器？

問題描述

1 個解決方案

解決方案1
2 已采納 2018-09-11 21:26:49

布爾列上的條件DataFrame過濾器？

問題描述

1 個解決方案

解決方案1 2 已采納 2018-09-11 21:26:49

解決方案1
2 已采納 2018-09-11 21:26:49