简体   繁体   中英

Proper way to alter data on a pandas dataframe slice

I have a pandas dataframe of EOD stock data that looks like this:

    Date    High    Low Open    Close   Volume  Adj Close   Symbol  Pct_Change
0   1999-11-18  35.765381   28.612303   32.546494   31.473534   62546300.0  27.369196   A   0
1   1999-11-19  30.758226   28.478184   30.713520   28.880543   15234100.0  25.114351   A   0
2   1999-11-22  31.473534   28.657009   29.551144   31.473534   6577800.0   27.369196   A   0
3   1999-11-23  31.205294   28.612303   30.400572   28.612303   5975600.0   24.881086   A   0
4   1999-11-24  29.998211   28.612303   28.701717   29.372318   4843200.0   25.541994   A   0

I would like to add a Pct_Change column that calculates the percent change at each closing price from the Adj Close column. I could just do something like this:

df.Pct_Change = df['Adj Close'].pct_change()

This solution would get close, but it would have some overlap between stocks, which I do not want.

So here is the solution that I am trying, but it is not setting the data in the original df, so I end up with everything in the Pct_Change column still being 0 .

# first set everything equal to 0
df['Pct_Change'] = 0

for stock in all_data.Symbol.unique():
    subset = df.loc[all_data.Symbol == stock]
    subset.Pct_Change = subset['Adj Close'].pct_change()

Edit: I haven't been able to get these solutions to work so below I have put a minimal dataset to work with that might help with testing.

pd.DataFrame({'Date': {0: Timestamp('1988-01-04 00:00:00'),
  1: Timestamp('1988-01-05 00:00:00'),
  2: Timestamp('1988-01-06 00:00:00'),
  3: Timestamp('1988-01-07 00:00:00'),
  4: Timestamp('1988-01-08 00:00:00'),
  5: Timestamp('1988-01-04 00:00:00'),
  6: Timestamp('1988-01-05 00:00:00'),
  7: Timestamp('1988-01-06 00:00:00'),
  8: Timestamp('1988-01-07 00:00:00'),
  9: Timestamp('1988-01-08 00:00:00')},
 'High': {0: 1.5982142686843872,
  1: 1.6517857313156128,
  2: 1.6071428060531616,
  3: 1.5982142686843872,
  4: 1.6160714626312256,
  5: 10.15625,
  6: 10.34375,
  7: 10.25,
  8: 10.375,
  9: 10.28125},
 'Low': {0: 1.5089285373687744,
  1: 1.5803571939468384,
  2: 1.5625,
  3: 1.5178571939468384,
  4: 1.4107142686843872,
  5: 9.6875,
  6: 10.09375,
  7: 10.09375,
  8: 10.0,
  9: 9.15625},
 'Open': {0: 1.5267857313156128,
  1: 1.6428571939468384,
  2: 1.6071428060531616,
  3: 1.5535714626312256,
  4: 1.5892857313156128,
  5: 9.71875,
  6: 10.1875,
  7: 10.21875,
  8: 10.0625,
  9: 10.21875},
 'Close': {0: 1.5982142686843872,
  1: 1.59375,
  2: 1.5625,
  3: 1.5892857313156128,
  4: 1.4285714626312256,
  5: 10.125,
  6: 10.1875,
  7: 10.09375,
  8: 10.28125,
  9: 9.5},
 'Volume': {0: 82600000.0,
  1: 77280000.0,
  2: 67200000.0,
  3: 53200000.0,
  4: 121520000.0,
  5: 5674400.0,
  6: 8926800.0,
  7: 4974800.0,
  8: 7011200.0,
  9: 7753200.0},
 'Adj Close': {0: 0.08685751259326935,
  1: 0.08661489188671112,
  2: 0.08491652458906174,
  3: 0.0863722488284111,
  4: 0.07763802260160446,
  5: 0.9220728874206543,
  6: 0.9277651309967041,
  7: 0.9192269444465637,
  8: 0.9363031387329102,
  9: 0.8651551008224487},
 'Symbol': {0: 'AAPL',
  1: 'AAPL',
  2: 'AAPL',
  3: 'AAPL',
  4: 'AAPL',
  5: 'XOM',
  6: 'XOM',
  7: 'XOM',
  8: 'XOM',
  9: 'XOM'}})

Using groupby.pct_change :

df['Pct_Change'] = df.groupby('Symbol', sort=False)['Adj_Close'].pct_change()

print(df)
         Date       High        Low       Open      Close      Volume  \
0  1999-11-18  35.765381  28.612303  32.546494  31.473534  62546300.0   
1  1999-11-19  30.758226  28.478184  30.713520  28.880543  15234100.0   
2  1999-11-22  31.473534  28.657009  29.551144  31.473534   6577800.0   
3  1999-11-23  31.205294  28.612303  30.400572  28.612303   5975600.0   
4  1999-11-24  29.998211  28.612303  28.701717  29.372318   4843200.0   

   Adj_Close Symbol  Pct_Change  
0  27.369196      A         NaN  
1  25.114351      A   -0.082386  
2  27.369196      A    0.089783  
3  24.881086      A   -0.090909  
4  25.541994      A    0.026563  

this can be done using groupby as @Sandeep has pointed out. However using your solution:

df['Pct_Change'] = 0

for stock in df.Symbol.unique():
    subset = df.loc[df.Symbol == stock]
    df.Pct_Change = subset['Adj_Close'].pct_change()

Note, you were assigning subset['Adj_Close'].pct_change() to the new subset dataframe instead of the original one , hence the original dataframe was not altered.

print(df)
     Date       High        Low       Open      Close      Volume  \
0  1999-11-18  35.765381  28.612303  32.546494  31.473534  62546300.0   
1  1999-11-19  30.758226  28.478184  30.713520  28.880543  15234100.0   
2  1999-11-22  31.473534  28.657009  29.551144  31.473534   6577800.0   
3  1999-11-23  31.205294  28.612303  30.400572  28.612303   5975600.0   
4  1999-11-24  29.998211  28.612303  28.701717  29.372318   4843200.0   

   Adj_Close Symbol  Pct_Change  
0  27.369196      A         NaN  
1  25.114351      A   -0.082386  
2  27.369196      A    0.089783  
3  24.881086      A   -0.090909  
4  25.541994      A    0.026563  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM