简体   繁体   English

添加一个新列,其中包含数据框另一列的每两行之间的差异

[英]Add a new column containing the difference between EACH TWO ROWS of another column of a data frame

I would like to get the difference between each 2 rows of the column duration and then fill the values in a new column difference or print it.我想获得列duration的每 2 行之间的差异,然后将值填充到新的列difference中或打印它。 So basically I want: row(1)-row(2)=difference1, row(3)-row(4)=difference2, row(5)-row(6)=difference3.... Example of a code:所以基本上我想要:row(1)-row(2)=difference1, row(3)-row(4)=difference2, row(5)-row(6)=difference3....代码示例:

data = {'Profession':['Teacher', 'Banker', 'Teacher', 'Judge','lawyer','Teacher'], 'Gender':['Male','Male', 'Female', 'Male','Male','Female'],'Size':['M','M','L','S','S','M'],'Duration':['5','6','2','3','4','7']} 
data2={'Profession':['Doctor', 'Scientist', 'Scientist', 'Banker','Judge','Scientist'], 'Gender':['Male','Male', 'Female','Female','Male','Male'],'Size':['L','M','L','M','L','L'],'Duration':['1','2','9','10','1','17']} 
data3 = {'Profession':['Banker', 'Banker', 'Doctor', 'Doctor','lawyer','Teacher'], 'Gender':['Male','Male', 'Female', 'Female','Female','Male'],'Size':['S','M','S','M','L','S'],'Duration':['15','8','5','2','11','10']} 
data4={'Profession':['Judge', 'Judge', 'Scientist', 'Banker','Judge','Scientist'], 'Gender':['Female','Female', 'Female','Female','Female','Female'],'Size':['M','S','L','S','M','S'],'Duration':['1','2','9','10','1','17']} 
df= pd.DataFrame(data) 
df2=pd.DataFrame(data2)
df3=pd.DataFrame(data3)
df4=pd.DataFrame(data4)
DATA=pd.concat([df,df2,df3,df4])
DATA.groupby(['Profession','Size','Gender']).agg('sum')
D=DATA.reset_index()
D['difference']=D['Duration'].diff(-1)

I tried using diff(-1) but it's not exactly what I'm looking for.我尝试使用 diff(-1) 但这并不是我要找的。 any ideas?有任何想法吗?

在此处输入图像描述

Is that what you wanted?那是你想要的吗?

D["Neighbour"]=D["Duration"].shift(-1)
# fill empty lines with 0
D["Neighbour"] = D["Neighbour"].fillna(0)
# convert columns "Neighbour" and "Duration" to numeric
D["Neighbour"] = pd.to_numeric(D["Neighbour"])
D["Duration"] = pd.to_numeric(D["Duration"])
# get difference
D["difference"]=D["Duration"] - D["Neighbour"]
# remove "Neighbour" column
D = D.drop(columns=["Neighbour"], axis=1)
# remove odd lines
D.loc[1::2,"difference"] = None
# print D
D

在此处输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 寻找一种更快的方法在数据框中创建新列,其中包含来自另一列行的字典值 - Looking for a faster way to create a new column in a data frame containing a dictionary values from the rows of another column 根据两行另一变量/列在数据框中生成新的变量/列 - Generating a new variable/column in a data frame based on two rows of another variable/column 数据框列中两列条目之间的迭代时间差 - Iterative time difference between two column entries in a data frame column 如何通过 pyspark 中的列向另一个数据帧中的数据帧添加行 - how to add rows to a data frame that are in another data frame by a column in pyspark 为第一个数据帧的每一列计算两个数据帧的差异 - Calculating difference of two data frames for each column of first data frame 添加具有两行差异的新列给出 SettingWithCopyWarning - Add new column with difference of two rows give SettingWithCopyWarning 在 pandas 数据框中的两列组合的行之间填充 - fill in between rows of two column combinations in a pandas data frame 如何向具有不同列号的 Pandas 数据框添加新行? - How to add new rows to a Pandas Data Frame with varying column numbers? 熊猫将两个列值添加到新数据框中 - Pandas add two column values to new data frame 如何使用 pandas 数据框将数据框的每一列值添加到一张一张的新工作表中 - How to add each column of a data frame values in one by one new sheets using pandas data frame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM