[英]Loop through rows of a dataframe, create a new column and store the result based on condition in another column
I have a df as follows:我有一个 df 如下:
Name Reference Efficiency
TargetA Yes 13
Target_1 No 12
Target_2 No 13
Target_3 No 10
Target_4 No 8
TargetB Yes 14
Target_4 No 12
Target_5 No 11
Target_6 No 10
TargetC Yes 15
Target_6 No 11
Target_7 No 13
Target_8 No 12
Target_9 No 14
Target_10 No 10
I want to loop through all the rows, wherever there is 'Yes' in Reference column, it will create another column called 'Check' and subtract Efficiency values(13,12,13,10,8) from 13(which is the corresponding value of 'Yes'. Next it will subtract Efficiency values(14,12,11,10) from 14(which is the corresponding value of 'Yes' for the next 'Yes') and so on.我想遍历所有行,只要参考列中有“是”,它将创建另一个名为“检查”的列,并从 13 中减去效率值(13、12、13、10、8)(这是相应的'Yes'的值。接下来它将从14中减去效率值(14,12,11,10)(这是下一个'Yes'对应的'Yes'值)等等。
Expected output:预期 output:
Name Reference Efficiency Check
TargetA Yes 13 0
Target_1 No 12 1
Target_2 No 13 0
Target_3 No 10 3
Target_4 No 8 5
TargetB Yes 14 0
Target_4 No 12 2
Target_5 No 11 3
Target_6 No 10 4
TargetC Yes 15 0
Target_6 No 11 4
Target_7 No 13 2
Target_8 No 12 3
Target_9 No 14 1
Target_10 No 10 5
I have tried the following codes:我尝试了以下代码:
for i, row in df.iterrows():
i = 0
val = row['Reference']
if val == 'Yes':
df['check'] = df.loc[i,'Efficiency'] - df['Efficiency'].shift(0)
I got the following result:我得到以下结果:
Name Reference Efficiency Check
0 TargetA Yes 13 0
1 Target_1 No 12 1
2 Target_2 No 13 0
3 Target_3 No 10 3
4 Target_4 No 8 5
5 TargetB Yes 14 -1
6 Target_4 No 12 1
7 Target_5 No 11 2
8 Target_6 No 10 3
9 TargetC Yes 15 -2
10 Target_6 No 11 2
11 Target_7 No 13 0
12 Target_8 No 12 1
13 Target_9 No 14 -1
14 Target_10 No 10 3
I got the result correctly in the first 'Yes' Please can someone help me我在第一个“是”中得到了正确的结果请有人帮助我
Create an auxillary / helper column, only containing the Efficiencies where "Yes" was found.创建一个辅助/辅助列,仅包含找到“是”的效率。 Then replace missing values with the previous valid entries, go through the example step by step:
然后通过示例逐步将缺失值替换为之前的有效条目 go:
Sample data:样本数据:
import pandas as pd
data = {'Name': {0: 'TargetA',
1: 'Target_1',
2: 'Target_2',
3: 'Target_3',
4: 'Target_4',
5: 'TargetB',
6: 'Target_4',
7: 'Target_5',
8: 'Target_6',
9: 'TargetC',
10: 'Target_6',
11: 'Target_7',
12: 'Target_8',
13: 'Target_9',
14: 'Target_10'},
'Reference': {0: 'Yes',
1: 'No',
2: 'No',
3: 'No',
4: 'No',
5: 'Yes',
6: 'No',
7: 'No',
8: 'No',
9: 'Yes',
10: 'No',
11: 'No',
12: 'No',
13: 'No',
14: 'No'},
'Efficiency': {0: 13,
1: 12,
2: 13,
3: 10,
4: 8,
5: 14,
6: 12,
7: 11,
8: 10,
9: 15,
10: 11,
11: 13,
12: 12,
13: 14,
14: 10}}
df = pd.DataFrame(data)
Code:代码:
mask = df['Reference'].eq('Yes')
df['Check'] = pd.NA
df.loc[mask, 'Check'] = df['Efficiency'].loc[mask].copy()
df['Check'] = df['Check'].ffill()
df['Check'] = df['Check'] - df['Efficiency']
I also used to code below to create an auxiliary / helper column, only containing the Efficiencies where "Yes" was found:我还使用下面的代码来创建一个辅助/帮助列,仅包含找到“是”的效率:
for i, row in df.iterrows():
val = row['Reference']
if val == 'Yes':
df['check'] = df[df['Reference']=='Yes']['Efficiency']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.