简体   繁体   English

Python Pandas - 按组内唯一值的数量过滤 df

[英]Python Pandas - filtering df by the number of unique values within a group

Here is an example of data I'm working on.这是我正在处理的数据示例。 (as a pandas df) (作为熊猫df)

    index   inv Rev_stream  Bill_type   Net_rev
       1    1   A           Original    -24.77
       2    1   B           Original    -24.77
       3    2   A           Original    -409.33
       4    2   B           Original    -409.33
       5    2   C           Original    -409.33
       6    2   D           Original    -409.33
       7    3   A           Original    -843.11
       8    3   A           Rebill       279.5
       9    3   B           Original    -843.11
      10    4   A           Rebill       279.5
      11    4   B           Original    -843.11
      12    5   B           Rebill       279.5

How could I filter this df, in a way to only get the lines where invoice/Rev_stream combo has both original and rebill kind of Net_rev.我怎样才能过滤这个 df,以某种方式只获取发票/Rev_stream 组合同时具有原始和重新计费类型的 Net_rev 的行。 In the example above it would be only lines with index 7 and 8.在上面的例子中,它只会是索引为 7 和 8 的行。

Is there an easy way to do it, without iterating over the whole dataframe and building dictionaries of invoice+RevStream : Bill_type?有没有一种简单的方法可以做到这一点,而无需遍历整个数据框并构建 invoice+RevStream 的字典:Bill_type?

What I'm looking for is some kind of我正在寻找的是某种

df = df[df[['inv','Rev_stream']]['Bill_type'].unique().len() == 2]

Unfortunately the code above doesn't work.不幸的是,上面的代码不起作用。

Thanks in advance.提前致谢。

You can group your data by inv and Rev_stream columns and then check for each group if both Original and Rebill are in the Bill_type values and filter based on the condition:您可以按invRev_stream列对数据进行Rev_stream ,然后检查每个组是否OriginalRebill都在Bill_type值中并根据条件进行过滤:

(df.groupby(['inv', 'Rev_stream'])
   .filter(lambda g: 'Original' in g.Bill_type.values and 'Rebill' in g.Bill_type.values))

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM