简体   繁体   English

在给定日期范围内持有的最大股票数量

[英]Max number of stocks held on given date range

I have historical traded data, with the date of entry, date of exit, and stock name.我有历史交易数据,包括进入日期、退出日期和股票名称。

I want to know the maximum number of stocks that were held in any given date range.我想知道在任何给定日期范围内持有的最大股票数量。 I am having this data in MySQL and in python pandas.我在 MySQL 和 python pandas 中有这些数据。

My dataset:我的数据集:

    entry_time  exit_time         stk
272 2020-06-10 2020-06-23    SHANKARA
197 2020-06-11 2020-06-25    PNCINFRA
85  2020-06-11 2020-06-25  DYNAMATECH
171 2020-06-15 2020-06-29     MANINDS
199 2020-06-16 2020-06-24     ASTERDM
241 2020-06-18 2020-06-29     JKPAPER
130 2020-06-18 2020-06-23  SOMANYCERA
159 2020-06-18 2020-06-25  EVERESTIND
212 2020-06-18 2020-07-01    JSLHISAR
295 2020-06-19 2020-06-25  IBVENTURES
133 2020-06-19 2020-07-02     FIEMIND
123 2020-06-19 2020-06-23    SUPRAJIT
118 2020-06-19 2020-07-01  NRBBEARING
97  2020-06-19 2020-06-24        SPAL
261 2020-06-19 2020-06-29   DALBHARAT
50  2020-06-22 2020-07-06   SANGINITA
150 2020-06-22 2020-07-06         BBL
55  2020-06-22 2020-07-06  SHARDAMOTR
169 2020-06-22 2020-07-06   BALAMINES
12  2020-06-22 2020-06-25   KIRLOSIND
284 2020-06-22 2020-07-06  NATCOPHARM
236 2020-06-23 2020-07-06   QUICKHEAL
69  2020-06-23 2020-07-07        HMVL
220 2020-06-24 2020-07-08      ASTRAL
42  2020-06-26 2020-07-10     MENONBE
260 2020-07-06 2020-07-16         BSE
105 2020-07-07 2020-07-20   GARFIBRES
35  2020-07-16 2020-07-30       SATIA
218 2020-07-17 2020-07-31   THYROCARE
8   2020-08-04 2020-08-07    CREATIVE

output should give me count of max number of stocks held between any given day.输出应该给我计算任何给定日期之间持有的最大股票数量。

for example from above data we can see on 2020-06-15 we had 4 stocks in our portfolio .例如,从上面的数据我们可以看到,在 2020-06-15,我们的投资组合中有 4 只股票。 hence if 4 is the max number of stocks held output should show me 4.因此,如果 4 是持有的最大股票数量,输出应该显示 4。

In SQL, you can handle this by keeping track of ins-and-outs, and then doing a cumulative sum:在 SQL 中,您可以通过跟踪来龙去脉,然后进行累积求和来处理此问题:

with io as (
      select entry_time as dt, 1 as inc
      from t
      union all
      select exit_time  + interval 1 day as dt, -1
      from t
     )
select dt, sum(inc) as change_on_day,
       sum(sum(inc)) over (order by dt) as active_on_day
from io
group by dt
order by active_on_day desc
limit 1;

Here is a db<>fiddle. 是一个 db<>fiddle。

You can:你可以:

  1. Create a group of the count of all entry_time by exit_time and then merge the dataframe back on itself to figure out the value to be subtracted (count of stocks that exited for each entry_time ).通过exit_time创建一组所有entry_timeexit_time ,然后将数据帧合并回自身以找出要减去的值(每个entry_time退出的股票计数)。
  2. However, you cannot just subtract the value of that particular row from the row number (the row number is df.index + 1 , but you need to first do df = df.reset_index(drop=True) as I have done in the first line of code).但是,您不能只从行号中减去该特定行的值(行号为df.index + 1 ,但您需要先执行df = df.reset_index(drop=True)就像我在第一个中所做的那样代码行)。 Then, you need to subtract the cumulative sum by using .cumsum() .然后,您需要使用.cumsum()减去累积总和。

df = df.reset_index(drop=True)
s = (pd.merge(df,
              df[['exit_time']].assign(exit_count=df.groupby('exit_time')['entry_time'].transform('count'))
                                                   .rename({'exit_time':'entry_time'}, axis= 1)
                                                   .drop_duplicates(subset='entry_time')
              ,on='entry_time', how='left'))['exit_count'].fillna(0)
df['COUNT'] = df.index + 1 - s.cumsum()
df
Out[1]: 
    entry_time   exit_time         stk  COUNT
0   2020-06-10  2020-06-23    SHANKARA    1.0
1   2020-06-11  2020-06-25    PNCINFRA    2.0
2   2020-06-11  2020-06-25  DYNAMATECH    3.0
3   2020-06-15  2020-06-29     MANINDS    4.0
4   2020-06-16  2020-06-24     ASTERDM    5.0
5   2020-06-18  2020-06-29     JKPAPER    6.0
6   2020-06-18  2020-06-23  SOMANYCERA    7.0
7   2020-06-18  2020-06-25  EVERESTIND    8.0
8   2020-06-18  2020-07-01    JSLHISAR    9.0
9   2020-06-19  2020-06-25  IBVENTURES   10.0
10  2020-06-19  2020-07-02     FIEMIND   11.0
11  2020-06-19  2020-06-23    SUPRAJIT   12.0
12  2020-06-19  2020-07-01  NRBBEARING   13.0
13  2020-06-19  2020-06-24        SPAL   14.0
14  2020-06-19  2020-06-29   DALBHARAT   15.0
15  2020-06-22  2020-07-06   SANGINITA   16.0
16  2020-06-22  2020-07-06         BBL   17.0
17  2020-06-22  2020-07-06  SHARDAMOTR   18.0
18  2020-06-22  2020-07-06   BALAMINES   19.0
19  2020-06-22  2020-06-25   KIRLOSIND   20.0
20  2020-06-22  2020-07-06  NATCOPHARM   21.0
21  2020-06-23  2020-07-06   QUICKHEAL   19.0
22  2020-06-23  2020-07-07        HMVL   17.0
23  2020-06-24  2020-07-08      ASTRAL   16.0
24  2020-06-26  2020-07-10     MENONBE   17.0
25  2020-07-06  2020-07-16         BSE   12.0
26  2020-07-07  2020-07-20   GARFIBRES   12.0
27  2020-07-16  2020-07-30       SATIA   12.0
28  2020-07-17  2020-07-31   THYROCARE   13.0
29  2020-08-04  2020-08-07    CREATIVE   14.0

Then, to summarize by max stock per entry_time , you can use .groupby() :然后,按每个entry_time的最大库存进行总结,您可以使用.groupby()

df = df.groupby('entry_time')['COUNT'].max().reset_index()
df
Out[2]: 
    entry_time  COUNT
0   2020-06-10    1.0
1   2020-06-11    3.0
2   2020-06-15    4.0
3   2020-06-16    5.0
4   2020-06-18    9.0
5   2020-06-19   15.0
6   2020-06-22   21.0
7   2020-06-23   19.0
8   2020-06-24   16.0
9   2020-06-26   17.0
10  2020-07-06   12.0
11  2020-07-07   12.0
12  2020-07-16   12.0
13  2020-07-17   13.0
14  2020-08-04   14.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM