简体   繁体   English

Pandas:计算两行之间的百分比并将值添加为列

[英]Pandas: Calculate the percentage between two rows and add the value as a column

I have a dataset structured like this:我有一个这样结构的数据集:

"Date","Time","Open","High","Low","Close","Volume"

This time series represent the values of a generic stock market.此时间序列代表一般股票市场的价值。

I want to calculate the difference in percentage between two rows of the column "Close" (in fact, I want to know how much the value of the stock increased or decreased; each row represent a day).我想计算“收盘”列的两行之间的百分比差异(实际上,我想知道股票的价值增加或减少了多少;每行代表一天)。

I've done this with a for loop(that is terrible using pandas in a big data problem) and I create the right results but in a different DataFrame:我已经用 for 循环完成了这个(在大数据问题中使用 Pandas 很糟糕),我创建了正确的结果,但在不同的 DataFrame 中:

rows_number = df_stock.shape[0]

# The first row will be 1, because is calculated in percentage. If haven't any yesterday the value must be 1
percentage_df = percentage_df.append({'Date': df_stock.iloc[0]['Date'], 'Percentage': 1}, ignore_index=True)

# Foreach days, calculate the market trend in percentage
for index in range(1, rows_number):

    # n_yesterday : 100 = (n_today - n_yesterday) : x
    n_today = df_stock.iloc[index]['Close']
    n_yesterday = self.df_stock.iloc[index-1]['Close']
    difference = n_today - n_yesterday
    percentage = (100 * difference ) / n_yesterday

    percentage_df = percentage_df .append({'Date': df_stock.iloc[index]['Date'], 'Percentage': percentage}, ignore_index=True)

How could I refactor this taking advantage of dataFrame api, thus removing the for loop and creating a new column in place?我怎样才能利用 dataFrame api 重构它,从而删除 for 循环并在适当的位置创建一个新列?

I would suggest to first make the Date column as DateTime indexing for this you can use我建议首先将 Date 列作为 DateTime 索引,您可以使用

df_stock = df_stock.set_index(['Date'])
df_stock.index = pd.to_datetime(df_stock.index, dayfirst=True)

Then simply access any row with specific column by using datetime indexing and do any kind of operations whatever you want for example to calculate difference in percentage between two rows of the column "Close"然后通过使用日期时间索引简单地访问具有特定列的任何行,并根据需要执行任何类型的操作,例如计算“关闭”列的两行之间的百分比差异

df_stock['percentage'] = ((df_stock['15-07-2019']['Close'] - df_stock['14-07-2019']['Close'])/df_stock['14-07-2019']['Close']) * 100

You can also use for loop to do the operations for each date or row:您还可以使用 for 循环对每个日期或行执行操作:

for Dt in df_stock.index:

df['Change'] = df['Close'].pct_change()

或者如果你想以相反的顺序改变计算:

df['Change'] = df['Close'].pct_change(-1)

使用diff

(-df['Close'].diff())/df['Close'].shift()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM