[英]Pandas Dataframe iteration and selecting the rows based on condition - Change in Requirements
I have sorted data frame as mentioned below(Input DataFrame) and I need to iterate the rows,select & retrive the rows into output data frame based on below conditions.我已经按照下面提到的方式对数据帧进行了排序(输入数据帧),我需要迭代行,根据以下条件选择并将行检索到输出数据帧中。
• Condition 1: For a given R1,R2,W - if we have two records with TYPE 'A' and 'B' a) If (amoun1& amount2) of TYPE 'A' is > (amoun1& amount2 )of TYPE 'B' we need to bring the TYPE 'A' record into the output b) If (amoun1& amount2) of TYPE 'B' is > (amoun1& amount2 )of TYPE 'A' we need to bring the TYPE 'B' record into the output c) If (amoun1& amount2) of TYPE 'A' is = (amoun1& amount2 )of TYPE 'B' we need to bring the TYPE 'A' record into the output • 条件 1:对于给定的 R1,R2,W - 如果我们有两个记录分别为 TYPE 'A' 和 'B' a) 如果 TYPE 'A' 的 (amoun1& amount2) is > (amoun1& amount2 )of TYPE 'B'我们需要将 TYPE 'A' 记录带入输出 b) 如果 TYPE 'B' 的 (amoun1& amount2) is > (amoun1& amount2 )of TYPE 'A' 我们需要将 TYPE 'B' 记录带入输出 c ) 如果 TYPE 'A' 的 (amoun1& amount2) is = (amoun1& amount2 )of TYPE 'B' 我们需要将 TYPE 'A' 记录带入输出
• Condition 2: For a given R1,R2,W - if we have only record with TYPE 'A', we need to bring the TYPE 'A' record into the output • Condition 3: For a given R1,R2,W - if we have only record with TYPE 'B', we need to bring the TYPE 'B' record into the output Input Dataframe • 条件 2:对于给定的 R1,R2,W - 如果我们只有 TYPE 'A' 的记录,我们需要将 TYPE 'A' 记录带入输出 • 条件 3:对于给定的 R1,R2,W -如果我们只有 TYPE 'B' 的记录,我们需要将 TYPE 'B' 记录带入输出 Input Dataframe
R1 R2 W TYPE amount1 amount2
0 123 12 1 A 111 222
1 123 12 1 B 111 222
2 123 12 2 A 222 222
3 123 12 2 B 333 333
4 123 12 3 A 444 444
5 123 12 3 B 333 333
6 123 34 1 A 111 222
7 123 34 2 A 333 444
8 123 34 2 B 333 444
9 123 34 3 B 444 555
10 123 34 4 A 555 666
11 123 34 4 B 666 777
Output dataframe输出数据帧
R1 R2 W TYPE amount1 amount1
0 123 12 1 A 111 222
3 123 12 2 B 333 333
4 123 12 3 A 444 444
6 123 34 1 A 111 222
7 123 34 2 A 333 444
9 123 34 3 B 444 555
11 123 34 4 B 666 777
Selection based on your criteria's根据您的标准进行选择
def my_selection(idf):
# If 'A' and 'B' in 'TYPE' then give me the row with 'A'
if idf['TYPE'].unique().shape[0] == 2:
return idf[idf['TYPE'] == 'A']
else:
return idf
df2 = df.groupby(['R1', 'R2', 'W'], as_index=False).apply(lambda idf: my_selection(idf))
df2.index = df2.index.droplevels(-1)
# R1 R2 W TYPE amount1 amount2
# 0 123 12 1 A 111 222
# 1 123 12 2 A 333 444
# 2 123 12 3 A 555 666
# 3 123 34 1 A 111 222
# 4 123 34 2 A 222 333
# 5 123 34 3 B 444 555
# 6 123 34 4 A 555 666
All you have to do is groupby R1,R2,W and operate on Type column as follows:您所要做的就是 groupby R1,R2,W 并对 Type 列进行操作,如下所示:
data.groupby(['R1','R2','W']).apply(lambda x: 'A' if 'A' in x['Type'].values else 'B').reset_index()
You can merge this output with original DataFrame on the obtained columns from the above output to get corresponding 'amount1', 'amount2' values您可以将此输出与从上述输出获得的列上的原始 DataFrame 合并以获得相应的“amount1”、“amount2”值
This is what I would do:这就是我会做的:
categories = ['B','A'] #create a list of categories in ascending order of precedence
d={i:e for e,i in enumerate(categories)} #create a dictionary:{'A': 0, 'B': 1}
s=df['TYPE'].map(d) #map to df['TYPE'] and create a helper series
then assign this series to the dataframe and groupby+transform
max and check if it is equal to the helper series and return where both value matches:然后将此系列分配给数据框和
groupby+transform
max 并检查它是否等于辅助系列并返回两个值匹配的位置:
out = df[s.eq(df.assign(TYPE=s).groupby(['R1','R2','W'])['TYPE'].transform('max'))]
print(out)
R1 R2 W TYPE amount1 amount2
0 123 12 1 A 111 222
2 123 12 2 A 333 444
4 123 12 3 A 555 666
6 123 34 1 A 111 222
7 123 34 2 A 222 333
9 123 34 3 B 444 555
10 123 34 4 A 555 666
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.