根据另一列的值向python pandas数据框添加一列

Question

I have some pandas data frame, and I would like to add a column that is the difference of a column, based on the value of a third column. 我有一些熊猫数据框，我想根据第三列的值添加一列，该列与列的不同之处。 Here is a toy example: 这是一个玩具示例：

    import pandas as pd
    import numpy as np

     d = {'one' : pd.Series(range(4), index=['a', 'b', 'c', 'd']),
    'two' : pd.Series(range(4), index=['a', 'b', 'c', 'd'])}

    df = pd.DataFrame(d)

    df['three'] = [2,2,3,3]


    four = []
    for i in set(df['three']):
        for j in range(len(df) -1):
            four.append(df[df['three'] == i]['two'][j + 1] - df[df['three']==i]['two'][j])
    four.append(0)

    df['four'] = four

The final column should be [1, 1, 1, Nan], since that is the difference between each of the rows in the 'two' column 最后一列应为[1，1，1，Nan]，因为那是'two'列中每一行之间的差异

This makes more sense in the context of my original code -- my data frame is organized by some IDs, and then by time, and when I take the subset of the data frame by IDs, I'm left with the time series evolution of the variables for each individual ID. 这在我的原始代码的上下文中更有意义-我的数据帧是由一些ID组成，然后按时间组织的，当我按ID来获取数据帧的子集时，剩下的时间序列是每个ID的变量。 However, I keep on either receiving a key error, or attempting to edit a copy of the original data frame. 但是，我会继续收到一个关键错误，或者尝试编辑原始数据框的副本。 What is the right way to go about this? 解决这个问题的正确方法是什么？

Answer 1

You could replace df[df['three'] == i] with a groupby on column three. 您可以在第三列使用groupby替换df[df['three'] == i] 。 And perhaps replace ['two'][j + 1] - ['two'][j] with df['two'].shift(-1) - df['two'] . 也许用df['two'].shift(-1) - df['two']替换['two'][j + 1] - ['two'][j] df['two'].shift(-1) - df['two'] 。

I think that would be identical to what you are doing now within the nested loop. 我认为这与您现在在嵌套循环中所做的相同。 It depends a bit on what format you want as a result on how you would implement this. 这取决于您想要哪种格式，以及如何实现此格式。 One way would be: 一种方法是：

df.groupby('three').apply(lambda grp: pd.Series(grp['two'].shift(-1) - grp['two']))

Which would result in: 这将导致：

two    a   b
three       
2      1 NaN
3      1 NaN

The columns names become a bit meaningless after this operation. 在执行此操作后，列名变得毫无意义。

Answer 2

如果您要做的只是获取第二列的行之间的差，请使用shift方法。

df['four'] = df.two.shift(-1) - df.two

根据另一列的值向python pandas数据框添加一列

问题描述

2 个解决方案

解决方案1
0 2014-08-26 15:20:42

解决方案2
0 2014-08-26 21:04:42

根据另一列的值向python pandas数据框添加一列

问题描述

2 个解决方案

解决方案1 0 2014-08-26 15:20:42

解决方案2 0 2014-08-26 21:04:42

解决方案1
0 2014-08-26 15:20:42

解决方案2
0 2014-08-26 21:04:42