简体   繁体   English

在 Python / Pandas 中计算两行之间的差异

[英]Calculating difference between two rows in Python / Pandas

In python, how can I reference previous row and calculate something against it?在 python 中,我如何引用前一行并针对它计算一些东西? Specifically, I am working with dataframes in pandas - I have a data frame full of stock price information that looks like this:具体来说,我正在使用dataframes中的数据pandas - 我有一个充满股票价格信息的数据框,如下所示:

           Date   Close  Adj Close
251  2011-01-03  147.48     143.25
250  2011-01-04  147.64     143.41
249  2011-01-05  147.05     142.83
248  2011-01-06  148.66     144.40
247  2011-01-07  147.93     143.69

Here is how I created this dataframe:这是我创建此数据框的方式:

import pandas

url = 'http://ichart.finance.yahoo.com/table.csv?s=IBM&a=00&b=1&c=2011&d=11&e=31&f=2011&g=d&ignore=.csv'
data = data = pandas.read_csv(url)

## now I sorted the data frame ascending by date 
data = data.sort(columns='Date')

Starting with row number 2, or in this case, I guess it's 250 (PS - is that the index?), I want to calculate the difference between 2011-01-03 and 2011-01-04, for every entry in this dataframe.从第 2 行开始,或者在这种情况下,我猜它是 250(PS - 那是索引吗?),我想计算 2011-01-03 和 2011-01-04 之间的差异,对于这个数据框中的每个条目. I believe the appropriate way is to write a function that takes the current row, then figures out the previous row, and calculates the difference between them, the use the pandas apply function to update the dataframe with the value.我相信适当的方法是编写一个获取当前行的函数,然后计算前一行,并计算它们之间的差异,使用pandas apply函数用值更新数据框。

Is that the right approach?这是正确的方法吗? If so, should I be using the index to determine the difference?如果是这样,我应该使用索引来确定差异吗? (note - I'm still in python beginner mode, so index may not be the right term, nor even the correct way to implement this) (注意 - 我仍然处于 python 初学者模式,所以 index 可能不是正确的术语,甚至不是正确的实现方式)

I think you want to do something like this:我想你想做这样的事情:

In [26]: data
Out[26]: 
           Date   Close  Adj Close
251  2011-01-03  147.48     143.25
250  2011-01-04  147.64     143.41
249  2011-01-05  147.05     142.83
248  2011-01-06  148.66     144.40
247  2011-01-07  147.93     143.69

In [27]: data.set_index('Date').diff()
Out[27]: 
            Close  Adj Close
Date                        
2011-01-03    NaN        NaN
2011-01-04   0.16       0.16
2011-01-05  -0.59      -0.58
2011-01-06   1.61       1.57
2011-01-07  -0.73      -0.71

To calculate difference of one column.计算一列的差异。 Here is what you can do.这是你可以做的。

df=
      A      B
0     10     56
1     45     48
2     26     48
3     32     65

We want to compute row difference in A only and want to consider the rows which are less than 15.我们只想计算 A 中的行差异,并想考虑小于 15 的行。

df['A_dif'] = df['A'].diff()
df=
          A      B      A_dif
    0     10     56      Nan
    1     45     48      35
    2     26     48      19
    3     32     65      6
df = df[df['A_dif']<15]

df=
          A      B      A_dif
    0     10     56      Nan
    3     32     65      6

I don't know pandas, and I'm pretty sure it has something specific for this;我不知道 pandas,而且我很确定它有一些特定的东西; however, I'll give you the pure-Python solution, that might be of some help even if you need to use pandas:但是,我会给你纯 Python 解决方案,即使你需要使用 pandas,它也可能会有所帮助:

import csv
import urllib

# This basically retrieves the CSV files and loads it in a list, converting
# All numeric values to floats
url='http://ichart.finance.yahoo.com/table.csv?s=IBM&a=00&b=1&c=2011&d=11&e=31&f=2011&g=d&ignore=.csv'
reader = csv.reader(urllib.urlopen(url), delimiter=',')
# We sort the output list so the records are ordered by date
cleaned = sorted([[r[0]] + map(float, r[1:]) for r in list(reader)[1:]])

for i, row in enumerate(cleaned):  # enumerate() yields two-tuples: (<id>, <item>)
    # The try..except here is to skip the IndexError for line 0
    try:
        # This will calculate difference of each numeric field with the same field
        # in the row before this one
        print row[0], [(row[j] - cleaned[i-1][j]) for j in range(1, 7)]
    except IndexError:
        pass

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM