在给定日期范围内持有的最大股票数量

Question

I have historical traded data, with the date of entry, date of exit, and stock name.我有历史交易数据，包括进入日期、退出日期和股票名称。

I want to know the maximum number of stocks that were held in any given date range.我想知道在任何给定日期范围内持有的最大股票数量。 I am having this data in MySQL and in python pandas.我在 MySQL 和 python pandas 中有这些数据。

My dataset:我的数据集：

    entry_time  exit_time         stk
272 2020-06-10 2020-06-23    SHANKARA
197 2020-06-11 2020-06-25    PNCINFRA
85  2020-06-11 2020-06-25  DYNAMATECH
171 2020-06-15 2020-06-29     MANINDS
199 2020-06-16 2020-06-24     ASTERDM
241 2020-06-18 2020-06-29     JKPAPER
130 2020-06-18 2020-06-23  SOMANYCERA
159 2020-06-18 2020-06-25  EVERESTIND
212 2020-06-18 2020-07-01    JSLHISAR
295 2020-06-19 2020-06-25  IBVENTURES
133 2020-06-19 2020-07-02     FIEMIND
123 2020-06-19 2020-06-23    SUPRAJIT
118 2020-06-19 2020-07-01  NRBBEARING
97  2020-06-19 2020-06-24        SPAL
261 2020-06-19 2020-06-29   DALBHARAT
50  2020-06-22 2020-07-06   SANGINITA
150 2020-06-22 2020-07-06         BBL
55  2020-06-22 2020-07-06  SHARDAMOTR
169 2020-06-22 2020-07-06   BALAMINES
12  2020-06-22 2020-06-25   KIRLOSIND
284 2020-06-22 2020-07-06  NATCOPHARM
236 2020-06-23 2020-07-06   QUICKHEAL
69  2020-06-23 2020-07-07        HMVL
220 2020-06-24 2020-07-08      ASTRAL
42  2020-06-26 2020-07-10     MENONBE
260 2020-07-06 2020-07-16         BSE
105 2020-07-07 2020-07-20   GARFIBRES
35  2020-07-16 2020-07-30       SATIA
218 2020-07-17 2020-07-31   THYROCARE
8   2020-08-04 2020-08-07    CREATIVE

output should give me count of max number of stocks held between any given day.输出应该给我计算任何给定日期之间持有的最大股票数量。

for example from above data we can see on 2020-06-15 we had 4 stocks in our portfolio .例如，从上面的数据我们可以看到，在 2020-06-15，我们的投资组合中有 4 只股票。 hence if 4 is the max number of stocks held output should show me 4.因此，如果 4 是持有的最大股票数量，输出应该显示 4。

Answer 1

In SQL, you can handle this by keeping track of ins-and-outs, and then doing a cumulative sum:在 SQL 中，您可以通过跟踪来龙去脉，然后进行累积求和来处理此问题：

with io as (
      select entry_time as dt, 1 as inc
      from t
      union all
      select exit_time  + interval 1 day as dt, -1
      from t
     )
select dt, sum(inc) as change_on_day,
       sum(sum(inc)) over (order by dt) as active_on_day
from io
group by dt
order by active_on_day desc
limit 1;

Here is a db<>fiddle. 这是一个 db<>fiddle。

Answer 2

You can:你可以：

Create a group of the count of all entry_time by exit_time and then merge the dataframe back on itself to figure out the value to be subtracted (count of stocks that exited for each entry_time ).通过exit_time创建一组所有entry_time的exit_time ，然后将数据帧合并回自身以找出要减去的值（每个entry_time退出的股票计数）。
However, you cannot just subtract the value of that particular row from the row number (the row number is df.index + 1 , but you need to first do df = df.reset_index(drop=True) as I have done in the first line of code).但是，您不能只从行号中减去该特定行的值（行号为df.index + 1 ，但您需要先执行df = df.reset_index(drop=True)就像我在第一个中所做的那样代码行）。 Then, you need to subtract the cumulative sum by using .cumsum() .然后，您需要使用.cumsum()减去累积总和。

df = df.reset_index(drop=True)
s = (pd.merge(df,
              df[['exit_time']].assign(exit_count=df.groupby('exit_time')['entry_time'].transform('count'))
                                                   .rename({'exit_time':'entry_time'}, axis= 1)
                                                   .drop_duplicates(subset='entry_time')
              ,on='entry_time', how='left'))['exit_count'].fillna(0)
df['COUNT'] = df.index + 1 - s.cumsum()
df
Out[1]: 
    entry_time   exit_time         stk  COUNT
0   2020-06-10  2020-06-23    SHANKARA    1.0
1   2020-06-11  2020-06-25    PNCINFRA    2.0
2   2020-06-11  2020-06-25  DYNAMATECH    3.0
3   2020-06-15  2020-06-29     MANINDS    4.0
4   2020-06-16  2020-06-24     ASTERDM    5.0
5   2020-06-18  2020-06-29     JKPAPER    6.0
6   2020-06-18  2020-06-23  SOMANYCERA    7.0
7   2020-06-18  2020-06-25  EVERESTIND    8.0
8   2020-06-18  2020-07-01    JSLHISAR    9.0
9   2020-06-19  2020-06-25  IBVENTURES   10.0
10  2020-06-19  2020-07-02     FIEMIND   11.0
11  2020-06-19  2020-06-23    SUPRAJIT   12.0
12  2020-06-19  2020-07-01  NRBBEARING   13.0
13  2020-06-19  2020-06-24        SPAL   14.0
14  2020-06-19  2020-06-29   DALBHARAT   15.0
15  2020-06-22  2020-07-06   SANGINITA   16.0
16  2020-06-22  2020-07-06         BBL   17.0
17  2020-06-22  2020-07-06  SHARDAMOTR   18.0
18  2020-06-22  2020-07-06   BALAMINES   19.0
19  2020-06-22  2020-06-25   KIRLOSIND   20.0
20  2020-06-22  2020-07-06  NATCOPHARM   21.0
21  2020-06-23  2020-07-06   QUICKHEAL   19.0
22  2020-06-23  2020-07-07        HMVL   17.0
23  2020-06-24  2020-07-08      ASTRAL   16.0
24  2020-06-26  2020-07-10     MENONBE   17.0
25  2020-07-06  2020-07-16         BSE   12.0
26  2020-07-07  2020-07-20   GARFIBRES   12.0
27  2020-07-16  2020-07-30       SATIA   12.0
28  2020-07-17  2020-07-31   THYROCARE   13.0
29  2020-08-04  2020-08-07    CREATIVE   14.0

Then, to summarize by max stock per entry_time , you can use .groupby() :然后，按每个entry_time的最大库存进行总结，您可以使用.groupby() ：

df = df.groupby('entry_time')['COUNT'].max().reset_index()
df
Out[2]: 
    entry_time  COUNT
0   2020-06-10    1.0
1   2020-06-11    3.0
2   2020-06-15    4.0
3   2020-06-16    5.0
4   2020-06-18    9.0
5   2020-06-19   15.0
6   2020-06-22   21.0
7   2020-06-23   19.0
8   2020-06-24   16.0
9   2020-06-26   17.0
10  2020-07-06   12.0
11  2020-07-07   12.0
12  2020-07-16   12.0
13  2020-07-17   13.0
14  2020-08-04   14.0

在给定日期范围内持有的最大股票数量

问题描述

2 个解决方案

解决方案1
1 2020-09-14 11:32:15

解决方案2
0 2020-09-14 08:07:36

在给定日期范围内持有的最大股票数量

问题描述

2 个解决方案

解决方案1 1 2020-09-14 11:32:15

解决方案2 0 2020-09-14 08:07:36

解决方案1
1 2020-09-14 11:32:15

解决方案2
0 2020-09-14 08:07:36