简体   繁体   English

根据另一列的值向python pandas数据框添加一列

[英]Adding a column to a python pandas data frame based on the value of another column

I have some pandas data frame, and I would like to add a column that is the difference of a column, based on the value of a third column. 我有一些熊猫数据框,我想根据第三列的值添加一列,该列与列的不同之处。 Here is a toy example: 这是一个玩具示例:

    import pandas as pd
    import numpy as np

     d = {'one' : pd.Series(range(4), index=['a', 'b', 'c', 'd']),
    'two' : pd.Series(range(4), index=['a', 'b', 'c', 'd'])}

    df = pd.DataFrame(d)

    df['three'] = [2,2,3,3]


    four = []
    for i in set(df['three']):
        for j in range(len(df) -1):
            four.append(df[df['three'] == i]['two'][j + 1] - df[df['three']==i]['two'][j])
    four.append(0)

    df['four'] = four

The final column should be [1, 1, 1, Nan], since that is the difference between each of the rows in the 'two' column 最后一列应为[1,1,1,Nan],因为那是'two'列中每一行之间的差异

This makes more sense in the context of my original code -- my data frame is organized by some IDs, and then by time, and when I take the subset of the data frame by IDs, I'm left with the time series evolution of the variables for each individual ID. 这在我的原始代码的上下文中更有意义-我的数据帧是由一些ID组成,然后按时间组织的,当我按ID来获取数据帧的子集时,剩下的时间序列是每个ID的变量。 However, I keep on either receiving a key error, or attempting to edit a copy of the original data frame. 但是,我会继续收到一个关键错误,或者尝试编辑原始数据框的副本。 What is the right way to go about this? 解决这个问题的正确方法是什么?

You could replace df[df['three'] == i] with a groupby on column three. 您可以在第三列使用groupby替换df[df['three'] == i] And perhaps replace ['two'][j + 1] - ['two'][j] with df['two'].shift(-1) - df['two'] . 也许用df['two'].shift(-1) - df['two']替换['two'][j + 1] - ['two'][j] df['two'].shift(-1) - df['two']

I think that would be identical to what you are doing now within the nested loop. 我认为这与您现在在嵌套循环中所做的相同。 It depends a bit on what format you want as a result on how you would implement this. 这取决于您想要哪种格式,以及如何实现此格式。 One way would be: 一种方法是:

df.groupby('three').apply(lambda grp: pd.Series(grp['two'].shift(-1) - grp['two']))

Which would result in: 这将导致:

two    a   b
three       
2      1 NaN
3      1 NaN

The columns names become a bit meaningless after this operation. 在执行此操作后,列名变得毫无意义。

如果您要做的只是获取第二列的行之间的差,请使用shift方法。

df['four'] = df.two.shift(-1) - df.two

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据另一列 pandas python 的值在 python 中添加新列 - Adding a new column in python based on the value of another column pandas python 根据 Python 中另一个数据框的另一列输入列值 - Enter column value based on another column of another data frame in Python Python Pandas 数据帧 基于另一列的计数值 - Python Pandas Data Frame Count values of one column based on another 将基于索引的列添加到Pandas中的数据框 - Adding a column based on index to a data frame in Pandas python pandas:根据列值拆分数据框 - python pandas : split a data frame based on a column value Python / pandas:创建数据框的列并根据在另一个 dataframe 范围内找到列值来设置其值 - Python / pandas: create a data frame's column and set it's value based on finding a column value in range of another dataframe Python:在Pandas数据框中添加一列 - Python: adding a column to the pandas data frame “是否存在一个熊猫函数,用于基于数据帧的另一列的某些值添加新列?” - “Is there an pandas function for adding a new column based on certain values of another column of the data frame?” 熊猫:根据另一个数据框中的值在数据框中添加新列 - Pandas: Add a new column in a data frame based on a value in another data frame 我需要基于列名从一个数据框到另一个数据框的值在 python pandas 中 - I need values from one data frame to another data frame in python pandas based on column name
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM