简体   繁体   中英

Pandas: Calculate the percentage between two rows and add the value as a column

I have a dataset structured like this:

"Date","Time","Open","High","Low","Close","Volume"

This time series represent the values of a generic stock market.

I want to calculate the difference in percentage between two rows of the column "Close" (in fact, I want to know how much the value of the stock increased or decreased; each row represent a day).

I've done this with a for loop(that is terrible using pandas in a big data problem) and I create the right results but in a different DataFrame:

rows_number = df_stock.shape[0]

# The first row will be 1, because is calculated in percentage. If haven't any yesterday the value must be 1
percentage_df = percentage_df.append({'Date': df_stock.iloc[0]['Date'], 'Percentage': 1}, ignore_index=True)

# Foreach days, calculate the market trend in percentage
for index in range(1, rows_number):

    # n_yesterday : 100 = (n_today - n_yesterday) : x
    n_today = df_stock.iloc[index]['Close']
    n_yesterday = self.df_stock.iloc[index-1]['Close']
    difference = n_today - n_yesterday
    percentage = (100 * difference ) / n_yesterday

    percentage_df = percentage_df .append({'Date': df_stock.iloc[index]['Date'], 'Percentage': percentage}, ignore_index=True)

How could I refactor this taking advantage of dataFrame api, thus removing the for loop and creating a new column in place?

I would suggest to first make the Date column as DateTime indexing for this you can use

df_stock = df_stock.set_index(['Date'])
df_stock.index = pd.to_datetime(df_stock.index, dayfirst=True)

Then simply access any row with specific column by using datetime indexing and do any kind of operations whatever you want for example to calculate difference in percentage between two rows of the column "Close"

df_stock['percentage'] = ((df_stock['15-07-2019']['Close'] - df_stock['14-07-2019']['Close'])/df_stock['14-07-2019']['Close']) * 100

You can also use for loop to do the operations for each date or row:

for Dt in df_stock.index:

df['Change'] = df['Close'].pct_change()

或者如果你想以相反的顺序改变计算:

df['Change'] = df['Close'].pct_change(-1)

使用diff

(-df['Close'].diff())/df['Close'].shift()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM