[英]Efficient/Pythonic way to Filter pandas DataFrame based on priority
I have below dataframe.我有以下 dataframe。
+-----------+----------+-----+
| InvoiceNo | ItemCode | Qty |
+-----------+----------+-----+
| Inv-001 | A | 2 |
+-----------+----------+-----+
| Inv-001 | B | 3 |
+-----------+----------+-----+
| Inv-001 | C | 1 |
+-----------+----------+-----+
| Inv-002 | B | 3 |
+-----------+----------+-----+
| Inv-002 | D | 4 |
+-----------+----------+-----+
| Inv-003 | C | 3 |
+-----------+----------+-----+
| Inv-003 | D | 9 |
+-----------+----------+-----+
| Inv-004 | D | 5 |
+-----------+----------+-----+
| Inv-004 | E | 8 |
+-----------+----------+-----+
| Inv-005 | X | 2 |
+-----------+----------+-----+
my task is to create an additional column Type
based on the priority of the item occurrence.我的任务是根据项目出现的优先级创建一个额外的列
Type
。
eg: ItemCode A
has 1st
Priority.例如:
ItemCode A
具有1st
优先级。 then B
has 2nd
priority and C
has 3rd
priority.然后
B
具有2nd
优先级, C
具有3rd
优先级。 rest of the items has least
priority and classified has Other
. rest 的项目优先级
least
,分类有Other
。
So, if any Invoice contains item A
, the type should be Type - A
irrespective other items presence.因此,如果任何 Invoice 包含项目
A
,则类型应为Type - A
而与其他项目无关。 from the balance Invoices if item B
contains, then the type should be Type - B
.从余额 Invoices 中,如果项目
B
包含,则类型应为Type - B
。 same for C
. C
相同。 if none of A, B or C
is not present in any invoice, then the type should be Type - Other
.如果任何发票中都不存在
A, B or C
,则类型应为Type - Other
。
Below is my desired output.下面是我想要的 output。
+-----------+----------+-----+--------------+
| InvoiceNo | ItemCode | Qty | Type |
+-----------+----------+-----+--------------+
| Inv-001 | A | 2 | Type - A |
+-----------+----------+-----+--------------+
| Inv-001 | B | 3 | Type - A |
+-----------+----------+-----+--------------+
| Inv-001 | C | 1 | Type - A |
+-----------+----------+-----+--------------+
| Inv-002 | B | 3 | Type - B |
+-----------+----------+-----+--------------+
| Inv-002 | D | 4 | Type - B |
+-----------+----------+-----+--------------+
| Inv-003 | C | 3 | Type - C |
+-----------+----------+-----+--------------+
| Inv-003 | D | 9 | Type - C |
+-----------+----------+-----+--------------+
| Inv-004 | D | 5 | Type - Other |
+-----------+----------+-----+--------------+
| Inv-004 | E | 8 | Type - Other |
+-----------+----------+-----+--------------+
| Inv-005 | X | 2 | Type - Other |
+-----------+----------+-----+--------------+
Below is my code and it works.下面是我的代码,它可以工作。 But, it is more cumbersome and not
pythonic
at all.但是,它更麻烦而且根本不是
pythonic
。
# load Dataframe
df = pd.read_excel()
# filter data containing `A`
mask_A = (df['ItemCode'] == 'A').groupby(df['InvoiceNo']).transform('any')
df_A = df[mask_A]
df_A['Type'] = 'Type - A'
# form the rest of the data, filter data containing `B`
df = df[~mask_A]
mask_B = (df['ItemCode'] == 'B').groupby(df['InvoiceNo']).transform('any')
df_B = df[mask_B]
df_B['Type'] = 'Type - B'
# form the rest of the data, filter data containing `c`
df = df[~mask_B]
mask_C = (df['ItemCode'] == 'C').groupby(df['InvoiceNo']).transform('any')
df_C = df[mask_C]
df_C['Type'] = 'Type - C'
# form the rest of the data, filter data doesnt contain `A, B or C`
df_Other = df[~mask_C]
df_Other['Type'] = 'Type - Other'
# Conctenate all the dataframes
df = pd.concat([df_A, df_B, df_C, df_Other], axis=0,sort=False)
Now, what is the most efficient
and pythonic
way to do this?现在,最
efficient
和最pythonic
的方法是什么?
I feel like we can do Categorical
then transform
我觉得我们可以做
Categorical
然后transform
df['Type']=pd.Categorical(df.ItemCode,['A','B','C'],ordered=True)
df['Type']='Type_'+df.groupby('InvoiceNo')['Type'].transform('min').fillna('other')
Update更新
df['Type']=pd.Categorical(df.ItemCode,['A','B','C'],ordered=True)
df=df.sort_values('Type')
df['Type']='Type_'+df.groupby('InvoiceNo')['Type'].transform('first').fillna('other')
df=df.sort_index()
df
Out[32]:
InvoiceNo ItemCode Qty Type
0 Inv-001 A 2 Type_A
1 Inv-001 B 3 Type_A
2 Inv-001 C 1 Type_A
3 Inv-002 B 3 Type_B
4 Inv-002 D 4 Type_B
5 Inv-003 C 3 Type_C
6 Inv-003 D 9 Type_C
7 Inv-004 D 5 Type_other
8 Inv-004 E 8 Type_other
9 Inv-005 X 2 Type_other
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.