[英]Python Pandas - filtering df by the number of unique values within a group
Here is an example of data I'm working on.这是我正在处理的数据示例。 (as a pandas df)
(作为熊猫df)
index inv Rev_stream Bill_type Net_rev
1 1 A Original -24.77
2 1 B Original -24.77
3 2 A Original -409.33
4 2 B Original -409.33
5 2 C Original -409.33
6 2 D Original -409.33
7 3 A Original -843.11
8 3 A Rebill 279.5
9 3 B Original -843.11
10 4 A Rebill 279.5
11 4 B Original -843.11
12 5 B Rebill 279.5
How could I filter this df, in a way to only get the lines where invoice/Rev_stream combo has both original and rebill kind of Net_rev.我怎样才能过滤这个 df,以某种方式只获取发票/Rev_stream 组合同时具有原始和重新计费类型的 Net_rev 的行。 In the example above it would be only lines with index 7 and 8.
在上面的例子中,它只会是索引为 7 和 8 的行。
Is there an easy way to do it, without iterating over the whole dataframe and building dictionaries of invoice+RevStream : Bill_type?有没有一种简单的方法可以做到这一点,而无需遍历整个数据框并构建 invoice+RevStream 的字典:Bill_type?
What I'm looking for is some kind of我正在寻找的是某种
df = df[df[['inv','Rev_stream']]['Bill_type'].unique().len() == 2]
Unfortunately the code above doesn't work.不幸的是,上面的代码不起作用。
Thanks in advance.提前致谢。
You can group your data by inv
and Rev_stream
columns and then check for each group if both Original
and Rebill
are in the Bill_type
values and filter based on the condition:您可以按
inv
和Rev_stream
列对数据进行Rev_stream
,然后检查每个组是否Original
和Rebill
都在Bill_type
值中并根据条件进行过滤:
(df.groupby(['inv', 'Rev_stream'])
.filter(lambda g: 'Original' in g.Bill_type.values and 'Rebill' in g.Bill_type.values))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.