Python Pandas - 按组内唯一值的数量过滤 df

Question

Here is an example of data I'm working on.这是我正在处理的数据示例。 (as a pandas df) （作为熊猫df）

    index   inv Rev_stream  Bill_type   Net_rev
       1    1   A           Original    -24.77
       2    1   B           Original    -24.77
       3    2   A           Original    -409.33
       4    2   B           Original    -409.33
       5    2   C           Original    -409.33
       6    2   D           Original    -409.33
       7    3   A           Original    -843.11
       8    3   A           Rebill       279.5
       9    3   B           Original    -843.11
      10    4   A           Rebill       279.5
      11    4   B           Original    -843.11
      12    5   B           Rebill       279.5

How could I filter this df, in a way to only get the lines where invoice/Rev_stream combo has both original and rebill kind of Net_rev.我怎样才能过滤这个 df，以某种方式只获取发票/Rev_stream 组合同时具有原始和重新计费类型的 Net_rev 的行。 In the example above it would be only lines with index 7 and 8.在上面的例子中，它只会是索引为 7 和 8 的行。

Is there an easy way to do it, without iterating over the whole dataframe and building dictionaries of invoice+RevStream : Bill_type?有没有一种简单的方法可以做到这一点，而无需遍历整个数据框并构建 invoice+RevStream 的字典：Bill_type？

What I'm looking for is some kind of我正在寻找的是某种

df = df[df[['inv','Rev_stream']]['Bill_type'].unique().len() == 2]

Unfortunately the code above doesn't work.不幸的是，上面的代码不起作用。

Thanks in advance.提前致谢。

Answer 1

You can group your data by inv and Rev_stream columns and then check for each group if both Original and Rebill are in the Bill_type values and filter based on the condition:您可以按inv和Rev_stream列对数据进行Rev_stream ，然后检查每个组是否Original和Rebill都在Bill_type值中并根据条件进行过滤：

(df.groupby(['inv', 'Rev_stream'])
   .filter(lambda g: 'Original' in g.Bill_type.values and 'Rebill' in g.Bill_type.values))

Python Pandas - 按组内唯一值的数量过滤 df

问题描述

1 个解决方案

解决方案1
2 已采纳 2016-10-17 14:36:24

Python Pandas - 按组内唯一值的数量过滤 df

问题描述

1 个解决方案

解决方案1 2 已采纳 2016-10-17 14:36:24

解决方案1
2 已采纳 2016-10-17 14:36:24