简体   繁体   English

循环遍历 dataframe 的行,创建一个新列并根据条件将结果存储在另一列中

[英]Loop through rows of a dataframe, create a new column and store the result based on condition in another column

I have a df as follows:我有一个 df 如下:

Name    Reference   Efficiency
TargetA    Yes      13
Target_1    No      12
Target_2    No      13
Target_3    No      10
Target_4    No      8
TargetB     Yes     14
Target_4    No      12
Target_5    No      11
Target_6    No     10
TargetC     Yes    15
Target_6    No      11
Target_7    No      13
Target_8    No      12
Target_9    No      14
Target_10   No     10

I want to loop through all the rows, wherever there is 'Yes' in Reference column, it will create another column called 'Check' and subtract Efficiency values(13,12,13,10,8) from 13(which is the corresponding value of 'Yes'. Next it will subtract Efficiency values(14,12,11,10) from 14(which is the corresponding value of 'Yes' for the next 'Yes') and so on.我想遍历所有行,只要参考列中有“是”,它将创建另一个名为“检查”的列,并从 13 中减去效率值(13、12、13、10、8)(这是相应的'Yes'的值。接下来它将从14中减去效率值(14,12,11,10)(这是下一个'Yes'对应的'Yes'值)等等。

Expected output:预期 output:

Name    Reference   Efficiency  Check

TargetA    Yes            13    0
Target_1    No            12    1
Target_2    No            13    0
Target_3    No            10    3
Target_4    No             8    5
TargetB     Yes           14    0
Target_4    No            12    2
Target_5    No            11    3
Target_6    No            10    4
TargetC     Yes           15    0
Target_6    No            11    4
Target_7    No            13    2
Target_8    No            12    3
Target_9    No            14    1
Target_10   No            10    5

I have tried the following codes:我尝试了以下代码:

for i, row in df.iterrows():
    i = 0
    val = row['Reference']
    if val == 'Yes':
        df['check'] = df.loc[i,'Efficiency'] - df['Efficiency'].shift(0)

I got the following result:我得到以下结果:

Name    Reference   Efficiency  Check
0   TargetA     Yes           13    0
1   Target_1    No            12    1
2   Target_2    No            13    0
3   Target_3    No            10    3
4   Target_4    No             8    5
5   TargetB     Yes           14    -1
6   Target_4    No            12    1
7   Target_5    No            11    2
8   Target_6    No            10    3
9   TargetC     Yes           15    -2
10  Target_6    No            11    2
11  Target_7    No            13    0
12  Target_8    No            12    1
13  Target_9    No            14    -1
14  Target_10   No            10    3

I got the result correctly in the first 'Yes' Please can someone help me我在第一个“是”中得到了正确的结果请有人帮助我

Create an auxillary / helper column, only containing the Efficiencies where "Yes" was found.创建一个辅助/辅助列,仅包含找到“是”的效率。 Then replace missing values with the previous valid entries, go through the example step by step:然后通过示例逐步将缺失值替换为之前的有效条目 go:

Sample data:样本数据:

import pandas as pd
data = {'Name': {0: 'TargetA',
  1: 'Target_1',
  2: 'Target_2',
  3: 'Target_3',
  4: 'Target_4',
  5: 'TargetB',
  6: 'Target_4',
  7: 'Target_5',
  8: 'Target_6',
  9: 'TargetC',
  10: 'Target_6',
  11: 'Target_7',
  12: 'Target_8',
  13: 'Target_9',
  14: 'Target_10'},
 'Reference': {0: 'Yes',
  1: 'No',
  2: 'No',
  3: 'No',
  4: 'No',
  5: 'Yes',
  6: 'No',
  7: 'No',
  8: 'No',
  9: 'Yes',
  10: 'No',
  11: 'No',
  12: 'No',
  13: 'No',
  14: 'No'},
 'Efficiency': {0: 13,
  1: 12,
  2: 13,
  3: 10,
  4: 8,
  5: 14,
  6: 12,
  7: 11,
  8: 10,
  9: 15,
  10: 11,
  11: 13,
  12: 12,
  13: 14,
  14: 10}}
df = pd.DataFrame(data)

Code:代码:

mask = df['Reference'].eq('Yes')
df['Check'] = pd.NA
df.loc[mask, 'Check'] = df['Efficiency'].loc[mask].copy()
df['Check'] = df['Check'].ffill()
df['Check'] = df['Check'] - df['Efficiency']

I also used to code below to create an auxiliary / helper column, only containing the Efficiencies where "Yes" was found:我还使用下面的代码来创建一个辅助/帮助列,仅包含找到“是”的效率:

for i, row in df.iterrows():   
    val = row['Reference']
    if val == 'Yes':
        df['check'] = df[df['Reference']=='Yes']['Efficiency']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas 数据框根据另一列的条件创建新行 - Pandas dataframe create new rows based on condition from another column 根据另一个 dataframe 的匹配结果在 dataframe 中创建新列 - Create new column in a dataframe based on matching result of another dataframe PySpark:根据列条件使用来自另一个行的行创建子集数据框 - PySpark: Create subset dataframe with rows from another based on a column condition 遍历列 pandas dataframe 并根据条件创建另一列 - iterate through columns pandas dataframe and create another column based on a condition Pandas 根据来自另一个 dataframe 的计数和条件创建新列 - Pandas Create new column based on a count and a condition from another dataframe 根据另一个 dataframe 中匹配值的行数创建新列 - Create new column based on number of rows matching value in another dataframe 检查特定列是否大于另一列并根据 pandas dataframe 中的条件创建新列 - Check if specific column is greater than another column and create a new column based on condition in pandas dataframe 根据另一列中的条件,使用 .diff() 函数的结果在 pandas df 中创建一个新列 - Create a new column in pandas df with the result of .diff() function based on a condition in another column 遍历一个数据框中的单个列与另一个数据框中的列进行比较使用熊猫在第一个数据框中创建新列 - loop through a single column in one dataframe compare to a column in another dataframe create new column in first dataframe using pandas 循环遍历数据框并在满足条件时将特定列中的行附加到新列表 - loop through a dataframe and append rows from a specific column to a new list when the condition is met
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM