简体   繁体   English

根据其他列创建新列,并在创建新列时根据上一行更新新列的行的值

[英]Create new columns based on other columns and update the value of row of new column based on the previous row while creating the new column

I am learning pandas in the past couple of months.在过去的几个月里,我正在学习熊猫。 I have a data frame like this:我有一个这样的数据框:

       index  Random  id  diff  pct
0 2018-01-01      31   1     3    1
1 2018-01-02      11   1     2    2
2 2018-01-03      21   1     4    0
3 2018-01-04      23   2     1    0
4 2018-01-05      43   2     6    3
5 2018-01-06      42   2     1    1
6 2018-01-07      51   3     2    5
7 2018-01-08      47   3     2    0
8 2018-01-09      49   3     3    2
9 2018-01-10      22   3     1    3

What I want is to create a column recommend by 'Yes' and 'NO' by conditioning on other columns which I can do, but I also need to update the value of the 'Random' column on each row(or create a new column) with updating info for random column if the 'recommend' Column is Yes.我想要的是通过以我可以做的其他列为条件来创建一个由“是”和“否”推荐的列,但我还需要更新每一行上“随机”列的值(或创建一个新列) 如果“推荐”列为“是”,则更新随机列的信息。 For instance, The condition is if pct<diff, then 'recommand' column will be 'Yes' and 'Random'/'New_random' will be Random+diff, otherwise the 'recommand' column will be 'No' and 'Random'/'New_random' will be Random value of the previous row.例如,条件是如果 pct<diff,那么 'recommand' 列将是 'Yes' 并且 'Random'/'New_random' 将是 Random+diff,否则 'recommand' 列将是 'No' 和 'Random' /'New_random' 将是上一行的随机值。 FYI, we have to update 'Random'/'New_Random' column if 'recommand' is yes for that row and later rows for each id.仅供参考,如果“recommand”对于该行是“是”,我们必须更新“Random”/“New_Random”列,以及每个 id 的后续行。 The expected output should look like this预期的输出应该是这样的

       index  Random  id  diff  pct recommend  Random_new
0 2018-01-01      31   1     3    1         Y          32
1 2018-01-02      31   1     2    2         N          32
2 2018-01-03      31   1     4    0         Y          36
3 2018-01-04      23   2     1    0         Y          24
4 2018-01-05      23   2     6    3         Y          27
5 2018-01-06      23   2     1    1         N          27
6 2018-01-07      51   3     2    5         N          51
7 2018-01-08      51   3     2    0         Y          53
8 2018-01-09      51   3     3    2         Y          56
9 2018-01-10      51   3     1    3         N          56

I have tried np.where which only create the column but don't update the row value for 'Random_new'.我试过 np.where ,它只创建列但不更新“Random_new”的行值。 I feel like I need to create a for loop with if else condition, but could not do it so far.我觉得我需要用 if else 条件创建一个 for 循环,但到目前为止还做不到。 The condition as bullet points:作为要点的条件:

  • If pct < diff 'Random_new'[i] = 'Random'[i]+'Diff'[i] else 'Random_new'[i]='Random_new'[i-1]如果 pct < diff 'Random_new'[i] = 'Random'[i]+'Diff'[i] 否则 'Random_new'[i]='Random_new'[i-1]
  • With updating that row also update the later rows for 'Random_new'随着更新该行也更新'Random_new'的后面的行
  • This needs to be for each id(probably using groupby) separately这需要分别针对每个 id(可能使用 groupby)

First I'm not sure how you filled your values in your example but shouldn't the first Random_new be equal to 31+3=34 instead of 32?首先,我不确定您如何在示例中填写您的值,但第一个Random_new不应该等于31+3=34而不是 32?

Anyway you can first create you recommend column (boolean seems better adapted than Y/N) then create the Random_new with apply (only when recommend is True) and finally fill ( ffill ) the values when grouped by id :无论如何,您可以先创建recommend列(布尔值似乎比 Y/N 更适合),然后使用 apply 创建Random_new (仅当recommend为 True 时),最后在按id分组时填充( ffill )值:

df['recommend'] = df['pct'] < df['diff']
df['Random_new'] = df.apply(lambda x: x['Random'] + x['diff'] if x['recommend'] else None, axis=1)
df = df.groupby('id').ffill()

Output:输出:

        index  Random  diff  pct  recommend  Random_new
0  2018-01-01      31     3    1       True        34.0
1  2018-01-02      11     2    2      False        34.0
2  2018-01-03      21     4    0       True        25.0
3  2018-01-04      23     1    0       True        24.0
4  2018-01-05      43     6    3       True        49.0
5  2018-01-06      42     1    1      False        49.0
6  2018-01-07      51     2    5      False         NaN
7  2018-01-08      47     2    0       True        49.0
8  2018-01-09      49     3    2       True        52.0
9  2018-01-10      22     1    3      False        52.0

Edit: if you wanna keep the id column replace the last line with:编辑:如果您想保留id列,请将最后一行替换为:

df = pd.concat([df['id'], df.groupby('id').ffill()], axis=1)

(kwarg as_index=False doesn't help in this case) (kwarg as_index=False在这种情况下没有帮助)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM