简体   繁体   English

在pandas数据帧python中减去两行的一系列列

[英]Subtracting over a range of columns for two rows in pandas dataframe python

I have a dataframe with six different labels: presence , x , y , vx , vy and lane .我有一个带有六个不同标签的数据框: presencexyvxvylane I would like to differentiate between two row indices over a range of columns [ x , y , vx , vy ].我想区分一系列列 [ x , y , vx , vy ] 上的两个行索引。 However, subtracting gives me NaN .但是,减法给了我NaN Thanks for the help.谢谢您的帮助。

import pandas as pd
data = {'presence': [1, 1, 0, 1],
        'x': [17, 35, 46, 57],
        'y': [4, 4, 8, 0],
        'vx': [2, 5, 9, 12],
        'vy': [0.3, 0.5, 0.2, 0], 
        'lane': [0, 1, 2, 0]}
df = pd.DataFrame(data)
a = df.iloc[[2]]
b = df.iloc[[1]]
diff_x = b[['x','y']] - a[['x','y']] # Gives two rows and two columns of nan
# Expected output: 11  4 

You can use .loc style indexing to get a pandas.Series for a certain row index and column names.您可以使用.loc样式索引来获取某个行索引和列名称的pandas.Series Then you can subtract those two series.然后你可以减去这两个系列。

If you expect to get 11 and 4 as your output, you will have to reverse your subtraction operation from your post.如果您希望得到 11 和 4 作为您的输出,您将不得不从您的帖子中反转您的减法运算。

diff_x = df.loc[2, ["x", "y"]] - df.loc[1, ["x", "y"]]

# x    11.0
# y     4.0
# dtype: float64

This is due to you pull out the a and b as DataFrame not series ,这是因为您将 a 和 b 作为DataFrame而不是 series ,

a
Out[312]: 
   presence   x  y  vx   vy  lane
2         0  46  8   9  0.2     2
b
Out[313]: 
   presence   x  y  vx   vy  lane
1         1  35  4   5  0.5     1

Above dataframe index is different, when we do the calculation pandas will check the index first , if index not match then output will be NaN上面的数据框index不同,当我们进行计算时, pandas会先检查index ,如果索引不匹配则输出为NaN

Quick fix :快速解决 :

diff_x = b[['x','y']].values - a[['x','y']].values
diff_x
Out[311]: array([[-11,  -4]], dtype=int64)

pandas is index oriented, convert to array and then compare: pandas 是面向索引的,转换为数组然后比较:

a = df.iloc[[2]]
b = df.iloc[[1]]
diff_x = a[['x','y']].to_numpy() - b[['x','y']].to_numpy()
#array([[11,  4]], dtype=int64)

Alternatively for 2 consecutive rows, you can use diff :或者,对于连续 2 行,您可以使用diff

df[['x','y']].diff().iloc[2]

x    11.0
y     4.0
Name: 2, dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM