[英]Max number of stocks held on given date range
I have historical traded data, with the date of entry, date of exit, and stock name.我有历史交易数据,包括进入日期、退出日期和股票名称。
I want to know the maximum number of stocks that were held in any given date range.我想知道在任何给定日期范围内持有的最大股票数量。 I am having this data in MySQL and in python pandas.我在 MySQL 和 python pandas 中有这些数据。
My dataset:我的数据集:
entry_time exit_time stk
272 2020-06-10 2020-06-23 SHANKARA
197 2020-06-11 2020-06-25 PNCINFRA
85 2020-06-11 2020-06-25 DYNAMATECH
171 2020-06-15 2020-06-29 MANINDS
199 2020-06-16 2020-06-24 ASTERDM
241 2020-06-18 2020-06-29 JKPAPER
130 2020-06-18 2020-06-23 SOMANYCERA
159 2020-06-18 2020-06-25 EVERESTIND
212 2020-06-18 2020-07-01 JSLHISAR
295 2020-06-19 2020-06-25 IBVENTURES
133 2020-06-19 2020-07-02 FIEMIND
123 2020-06-19 2020-06-23 SUPRAJIT
118 2020-06-19 2020-07-01 NRBBEARING
97 2020-06-19 2020-06-24 SPAL
261 2020-06-19 2020-06-29 DALBHARAT
50 2020-06-22 2020-07-06 SANGINITA
150 2020-06-22 2020-07-06 BBL
55 2020-06-22 2020-07-06 SHARDAMOTR
169 2020-06-22 2020-07-06 BALAMINES
12 2020-06-22 2020-06-25 KIRLOSIND
284 2020-06-22 2020-07-06 NATCOPHARM
236 2020-06-23 2020-07-06 QUICKHEAL
69 2020-06-23 2020-07-07 HMVL
220 2020-06-24 2020-07-08 ASTRAL
42 2020-06-26 2020-07-10 MENONBE
260 2020-07-06 2020-07-16 BSE
105 2020-07-07 2020-07-20 GARFIBRES
35 2020-07-16 2020-07-30 SATIA
218 2020-07-17 2020-07-31 THYROCARE
8 2020-08-04 2020-08-07 CREATIVE
output should give me count of max number of stocks held between any given day.输出应该给我计算任何给定日期之间持有的最大股票数量。
for example from above data we can see on 2020-06-15 we had 4 stocks in our portfolio .例如,从上面的数据我们可以看到,在 2020-06-15,我们的投资组合中有 4 只股票。 hence if 4 is the max number of stocks held output should show me 4.因此,如果 4 是持有的最大股票数量,输出应该显示 4。
In SQL, you can handle this by keeping track of ins-and-outs, and then doing a cumulative sum:在 SQL 中,您可以通过跟踪来龙去脉,然后进行累积求和来处理此问题:
with io as (
select entry_time as dt, 1 as inc
from t
union all
select exit_time + interval 1 day as dt, -1
from t
)
select dt, sum(inc) as change_on_day,
sum(sum(inc)) over (order by dt) as active_on_day
from io
group by dt
order by active_on_day desc
limit 1;
You can:你可以:
entry_time
by exit_time
and then merge the dataframe back on itself to figure out the value to be subtracted (count of stocks that exited for each entry_time
).通过exit_time
创建一组所有entry_time
的exit_time
,然后将数据帧合并回自身以找出要减去的值(每个entry_time
退出的股票计数)。df.index + 1
, but you need to first do df = df.reset_index(drop=True)
as I have done in the first line of code).但是,您不能只从行号中减去该特定行的值(行号为df.index + 1
,但您需要先执行df = df.reset_index(drop=True)
就像我在第一个中所做的那样代码行)。 Then, you need to subtract the cumulative sum by using .cumsum()
.然后,您需要使用.cumsum()
减去累积总和。df = df.reset_index(drop=True)
s = (pd.merge(df,
df[['exit_time']].assign(exit_count=df.groupby('exit_time')['entry_time'].transform('count'))
.rename({'exit_time':'entry_time'}, axis= 1)
.drop_duplicates(subset='entry_time')
,on='entry_time', how='left'))['exit_count'].fillna(0)
df['COUNT'] = df.index + 1 - s.cumsum()
df
Out[1]:
entry_time exit_time stk COUNT
0 2020-06-10 2020-06-23 SHANKARA 1.0
1 2020-06-11 2020-06-25 PNCINFRA 2.0
2 2020-06-11 2020-06-25 DYNAMATECH 3.0
3 2020-06-15 2020-06-29 MANINDS 4.0
4 2020-06-16 2020-06-24 ASTERDM 5.0
5 2020-06-18 2020-06-29 JKPAPER 6.0
6 2020-06-18 2020-06-23 SOMANYCERA 7.0
7 2020-06-18 2020-06-25 EVERESTIND 8.0
8 2020-06-18 2020-07-01 JSLHISAR 9.0
9 2020-06-19 2020-06-25 IBVENTURES 10.0
10 2020-06-19 2020-07-02 FIEMIND 11.0
11 2020-06-19 2020-06-23 SUPRAJIT 12.0
12 2020-06-19 2020-07-01 NRBBEARING 13.0
13 2020-06-19 2020-06-24 SPAL 14.0
14 2020-06-19 2020-06-29 DALBHARAT 15.0
15 2020-06-22 2020-07-06 SANGINITA 16.0
16 2020-06-22 2020-07-06 BBL 17.0
17 2020-06-22 2020-07-06 SHARDAMOTR 18.0
18 2020-06-22 2020-07-06 BALAMINES 19.0
19 2020-06-22 2020-06-25 KIRLOSIND 20.0
20 2020-06-22 2020-07-06 NATCOPHARM 21.0
21 2020-06-23 2020-07-06 QUICKHEAL 19.0
22 2020-06-23 2020-07-07 HMVL 17.0
23 2020-06-24 2020-07-08 ASTRAL 16.0
24 2020-06-26 2020-07-10 MENONBE 17.0
25 2020-07-06 2020-07-16 BSE 12.0
26 2020-07-07 2020-07-20 GARFIBRES 12.0
27 2020-07-16 2020-07-30 SATIA 12.0
28 2020-07-17 2020-07-31 THYROCARE 13.0
29 2020-08-04 2020-08-07 CREATIVE 14.0
Then, to summarize by max stock per entry_time
, you can use .groupby()
:然后,按每个entry_time
的最大库存进行总结,您可以使用.groupby()
:
df = df.groupby('entry_time')['COUNT'].max().reset_index()
df
Out[2]:
entry_time COUNT
0 2020-06-10 1.0
1 2020-06-11 3.0
2 2020-06-15 4.0
3 2020-06-16 5.0
4 2020-06-18 9.0
5 2020-06-19 15.0
6 2020-06-22 21.0
7 2020-06-23 19.0
8 2020-06-24 16.0
9 2020-06-26 17.0
10 2020-07-06 12.0
11 2020-07-07 12.0
12 2020-07-16 12.0
13 2020-07-17 13.0
14 2020-08-04 14.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.