简体   繁体   English

Select 基于值从 DataReader 行传输到 DataFrame

[英]Select rows from DataReader based on value and transfer to DataFrame

I am doing a project where I read in the historical values for a given stock, I then want to filter out the days where the price has jumped +5% or -5% into a different dataframe.我正在做一个项目,我在其中读取给定股票的历史价值,然后我想过滤掉价格上涨 +5% 或 -5% 到不同 dataframe 的日子。

But I am struggling with the transfer of the row.但我正在为行的转移而苦苦挣扎。

import pandas_datareader as web
import pandas as pd
import datetime

start = datetime.datetime(2015, 9, 1)
end = datetime.datetime(2019, 11, 2)

df1 = pd.DataFrame()
df = web.DataReader("amd", 'yahoo', start, end)

df['Close'] = df['Close'].astype(float)
df['Open'] = df['Open'].astype(float)

for row in df:
    df['perchange'] = ((df['Close']-df['Open'])/df['Open'])*100
    df['perchange'] = df['perchange'].astype(float)

    if df['perchange'] >= 5.0:
        df1 += df

    if ['perchange'] <= -5.0:
        df1 += df

df.to_csv('amd_volume_price_history.csv')
df1.to_csv('amd_5_to_5.csv')

You can do this to create a new dataframe with the rows where the percentage of changes is greater than 5% in absolute value .您可以执行此操作来创建一个新的 dataframe ,其中包含绝对值更改百分比大于 5% 的行 As you can see Series.between has been used to performance a boolean indexing :如您所见, Series.between已用于执行boolean indexing

not_significant=((df['Close']-df['Open'])/df['Open']).between(-0.05,0.05)
df_filtered=df[~not_significant]
print(df_filtered)

Output Output

                 High        Low       Open      Close     Volume  Adj Close
Date                                                                        
2015-09-11   2.140000   1.810000   1.880000   2.010000   31010300   2.010000
2015-09-14   2.000000   1.810000   2.000000   1.820000   16458500   1.820000
2015-10-19   2.010000   1.910000   1.910000   2.010000   10670800   2.010000
2015-10-23   2.210000   2.100000   2.100000   2.210000    9564200   2.210000
2015-11-03   2.290000   2.160000   2.160000   2.280000    8705800   2.280000
...               ...        ...        ...        ...        ...        ...
2019-06-06  31.980000  29.840000  29.870001  31.820000  131267800  31.820000
2019-07-31  32.299999  30.299999  32.080002  30.450001  119190000  30.450001
2019-08-08  34.270000  31.480000  31.530001  33.919998  167278800  33.919998
2019-08-12  34.650002  32.080002  34.160000  32.430000  106936000  32.430000
2019-08-23  31.830000  29.400000  31.299999  29.540001   83681100  29.540001

[123 rows x 6 columns]

if you really need perchange column you can create changing the code:如果您真的需要perchange列,您可以创建更改代码:

df['Perchange']=(df['Close']-df['Open'])/df['Open']*100
not_significant=(df['Perchange']).between(-5,5)
df_filtered=df[~not_significant]
print(df_filtered)

Also you can use DataFrame.pct_change :您也可以使用DataFrame.pct_change

df['Perchange']=df[['Open','Close']].pct_change(axis=1).Close*100

Output Output

                 High        Low       Open      Close     Volume  Adj Close  \
Date                                                                           
2015-09-11   2.140000   1.810000   1.880000   2.010000   31010300   2.010000   
2015-09-14   2.000000   1.810000   2.000000   1.820000   16458500   1.820000   
2015-10-19   2.010000   1.910000   1.910000   2.010000   10670800   2.010000   
2015-10-23   2.210000   2.100000   2.100000   2.210000    9564200   2.210000   
2015-11-03   2.290000   2.160000   2.160000   2.280000    8705800   2.280000   
...               ...        ...        ...        ...        ...        ...   
2019-06-06  31.980000  29.840000  29.870001  31.820000  131267800  31.820000   
2019-07-31  32.299999  30.299999  32.080002  30.450001  119190000  30.450001   
2019-08-08  34.270000  31.480000  31.530001  33.919998  167278800  33.919998   
2019-08-12  34.650002  32.080002  34.160000  32.430000  106936000  32.430000   
2019-08-23  31.830000  29.400000  31.299999  29.540001   83681100  29.540001   

            Perchange  
Date                   
2015-09-11   6.914893  
2015-09-14  -8.999997  
2015-10-19   5.235603  
2015-10-23   5.238102  
2015-11-03   5.555550  
...               ...  
2019-06-06   6.528285  
2019-07-31  -5.081050  
2019-08-08   7.580074  
2019-08-12  -5.064401  
2019-08-23  -5.622998  

[123 rows x 7 columns]

your code would look like this:您的代码如下所示:

#Libraries
import pandas_datareader as web
import pandas as pd
import datetime

#Getting data
start = datetime.datetime(2015, 9, 1)
end = datetime.datetime(2019, 11, 2)
df = web.DataReader("amd", 'yahoo', start, end)

#Convertint to float to calculate and filtering
df['Close'] = df['Close'].astype(float)
df['Open'] = df['Open'].astype(float)

#Creating Perchange column.
df['Perchange']=(df['Close']-df['Open'])/df['Open']*100
#df['Perchange']=df[['Open','Close']].pct_change(axis=1).Close*100

#Filtering
not_significant=(df['Perchange']).between(-5,5)
df_filtered=df[~not_significant]

#Saving data.
df.to_csv('amd_volume_price_history.csv')
df_filtered.to_csv('amd_5_to_5.csv')

EDIT编辑

df['Perchange']=(df['Close']-df['Open'])/df['Open']*100
significant=~(df['Perchange']).between(-5,5)
group_by_jump=significant.cumsum()
jump_and_4=group_by_jump.groupby(group_by_jump,sort=False).cumcount().le(4)&group_by_jump.ne(0)
df_filtered=df[jump_and_4]
print(df_filtered.head(50))

            High   Low  Open  Close    Volume  Adj Close  Perchange
Date                                                               
2015-09-11  2.14  1.81  1.88   2.01  31010300       2.01   6.914893
2015-09-14  2.00  1.81  2.00   1.82  16458500       1.82  -8.999997
2015-09-15  1.87  1.81  1.84   1.86   6524400       1.86   1.086955
2015-09-16  1.90  1.85  1.87   1.89   4928300       1.89   1.069518
2015-09-17  1.94  1.87  1.90   1.89   5831600       1.89  -0.526315
2015-09-18  1.92  1.85  1.87   1.87  11814000       1.87   0.000000
2015-10-19  2.01  1.91  1.91   2.01  10670800       2.01   5.235603
2015-10-20  2.03  1.97  2.00   2.02   5584200       2.02   0.999999
2015-10-21  2.12  2.01  2.02   2.10  14944100       2.10   3.960392
2015-10-22  2.16  2.09  2.10   2.14   8208400       2.14   1.904772
2015-10-23  2.21  2.10  2.10   2.21   9564200       2.21   5.238102
2015-10-26  2.21  2.12  2.21   2.15   6313500       2.15  -2.714929
2015-10-27  2.16  2.10  2.12   2.15   5755600       2.15   1.415104
2015-10-28  2.20  2.12  2.14   2.18   6950600       2.18   1.869157
2015-10-29  2.18  2.11  2.15   2.13   4500400       2.13  -0.930232
2015-11-03  2.29  2.16  2.16   2.28   8705800       2.28   5.555550
2015-11-04  2.30  2.18  2.27   2.20   8205300       2.20  -3.083698
2015-11-05  2.24  2.17  2.21   2.20   4302200       2.20  -0.452488
2015-11-06  2.21  2.13  2.19   2.15   8997100       2.15  -1.826482
2015-11-09  2.18  2.10  2.15   2.11   6231200       2.11  -1.860474
2015-11-18  2.15  1.98  1.99   2.12   9384700       2.12   6.532657
2015-11-19  2.16  2.09  2.10   2.14   4704300       2.14   1.904772
2015-11-20  2.25  2.13  2.14   2.22  10727100       2.22   3.738314
2015-11-23  2.24  2.18  2.22   2.22   4863200       2.22   0.000000
2015-11-24  2.40  2.17  2.20   2.34  15859700       2.34   6.363630
2015-11-25  2.40  2.31  2.36   2.38   6914800       2.38   0.847467
2015-11-27  2.38  2.32  2.37   2.33   2606600       2.33  -1.687762
2015-11-30  2.37  2.25  2.34   2.36   9924400       2.36   0.854700
2015-12-01  2.37  2.31  2.36   2.34   5646400       2.34  -0.847457
2015-12-16  2.55  2.37  2.39   2.54  19543600       2.54   6.276144
2015-12-17  2.60  2.52  2.52   2.56  11374100       2.56   1.587300
2015-12-18  2.55  2.42  2.51   2.45  17988100       2.45  -2.390436
2015-12-21  2.53  2.43  2.47   2.53   6876600       2.53   2.429147
2015-12-22  2.78  2.54  2.55   2.77  24893200       2.77   8.627452
2015-12-23  2.94  2.75  2.76   2.83  30365300       2.83   2.536229
2015-12-24  3.00  2.86  2.88   2.92  11890900       2.92   1.388888
2015-12-28  3.02  2.86  2.91   3.00  16050500       3.00   3.092780
2015-12-29  3.06  2.97  3.04   3.00  15300900       3.00  -1.315788
2016-01-06  2.71  2.47  2.66   2.51  23759400       2.51  -5.639101
2016-01-07  2.48  2.26  2.43   2.28  22203500       2.28  -6.172843
2016-01-08  2.42  2.10  2.36   2.14  31822400       2.14  -9.322025
2016-01-11  2.36  2.12  2.16   2.34  19629300       2.34   8.333325
2016-01-12  2.46  2.28  2.40   2.39  17986100       2.39  -0.416666
2016-01-13  2.45  2.21  2.40   2.25  12749700       2.25  -6.250004
2016-01-14  2.35  2.21  2.29   2.21  15666600       2.21  -3.493447
2016-01-15  2.13  1.99  2.10   2.03  21199300       2.03  -3.333330
2016-01-19  2.11  1.90  2.08   1.95  18978900       1.95  -6.249994
2016-01-20  1.95  1.75  1.81   1.80  29243600       1.80  -0.552486
2016-01-21  2.18  1.81  1.82   2.09  26387900       2.09  14.835157
2016-01-22  2.17  1.98  2.11   2.02  16245500       2.02  -4.265399

try to integrate your code with these modifications:尝试将您的代码与这些修改集成:

1) you probably don't need any loop to calculate the new column: 1)您可能不需要任何循环来计算新列:

df['perchange'] = ((df['Close']-df['Open'])/df['Open'])*100
df['perchange'] = df['perchange'].astype(float)

2) define an empty df 2)定义一个空的df

df1=pd.DataFrame([])

3) filter the old df with loc method (get used with its notation it is very useful) and append to the empty data frame, this will transfer the rows that verify the condition 3) 使用loc方法过滤旧的 df(使用它的符号非常有用)和 append 到空数据帧,这将传输验证条件的行

df1=df1.append(df.loc[(df['perchange'] <= -5.0) | (df['perchange'] >= -5.0)])
print(df1)

hope it helps希望能帮助到你

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM