简体   繁体   English

按时间范围对数据帧中的值进行分组 Python Pandas

[英]Grouping values in dataframe by timerange Python Pandas

Tried solving this on my own and I just can't figure this one out.尝试自己解决这个问题,但我无法解决这个问题。 I have a number of transactions which are made up of more transactions.我有许多由更多交易组成的交易。 I am trying to group them together by 10 second intervals.我试图以 10 秒的间隔将它们组合在一起。

This is where I'm at.这就是我所在的地方。 I've laid what I believe is the foundation but the most recent attempt ( the very last line ) comes back with "AttributeError: 'Int64Index' object has no attribute 'to_period'".我已经奠定了我认为的基础,但最近的尝试(最后一行)返回“AttributeError: 'Int64Index' object has no attribute 'to_period'”。

import pandas as pd
from datetime import timedelta

"""
read csv file
clean date column
convert date str to datetime
sort for equity options
replace date str column with datetime column
"""
trade_reader = pd.read_csv('TastyTrades.csv')
trade_reader['Date'] = trade_reader['Date'].replace({'T': ' ', '-0500': ''}, regex=True)
date_converter = pd.to_datetime(trade_reader['Date'], format="%Y-%m-%d %H:%M:%S")
options_frame = trade_reader.loc[(trade_reader['Instrument Type'] == 'Equity Option')]
clean_frame = options_frame.replace(to_replace=['Date'], value='date_converter')

# Separate opening transaction from closing transactions, combine frames
opens = clean_frame[clean_frame['Action'].isin(['BUY_TO_OPEN', 'SELL_TO_OPEN'])]
closes = clean_frame[clean_frame['Action'].isin(['BUY_TO_CLOSE', 'SELL_TO_CLOSE'])]
open_close_set = set(opens['Symbol']) & set(closes['Symbol'])
open_close_frame = clean_frame[clean_frame['Symbol'].isin(open_close_set)]

# convert Value to float, sort, write
ocf_float = open_close_frame['Value'].astype(float)
ocf_sorted = open_close_frame.sort_values(by=['Symbol', 'Call or Put', 'Date'], ascending=True)
ocf_sorted.to_csv('Sorted.csv')

BTO_frame = opens[opens['Action'].isin(['BUY_TO_OPEN'])]
STO_frame = opens[opens['Action'].isin(['SELL_TO_OPEN'])]
debit_single = []
vertical = []
iron_condor = []
delta = timedelta(seconds=10)

temp_list = BTO_frame.groupby(BTO_frame['Date'].index.to_period(second=10))

A sample of what I am working with:我正在使用的示例:

361,2020-01-15 15:27:18,Trade,BUY_TO_OPEN,QQQ   200221P00218000,Equity Option,Bought 1 QQQ 02/21/20 Put 218.00 @ 3.44,-344.00,1.0,-344.00,-1.0,-0.14,100.0,QQQ,2/21/20,218.0,PUT
356,2020-01-17 10:10:27,Trade,SELL_TO_CLOSE,QQQ   200221P00218000,Equity Option,Sold 1 QQQ 02/21/20 Put 218.00 @ 2.26,226.00,1.0,226.00,0.0,-0.15,100.0,QQQ,2/21/20,218.0,PUT
360,2020-01-15 15:27:18,Trade,SELL_TO_OPEN,QQQ   200221P00219000,Equity Option,Sold 1 QQQ 02/21/20 Put 219.00 @ 3.77,377.00,1.0,377.00,-1.0,-0.15,100.0,QQQ,2/21/20,219.0,PUT
357,2020-01-17 10:10:27,Trade,BUY_TO_CLOSE,QQQ   200221P00219000,Equity Option,Bought 1 QQQ 02/21/20 Put 219.00 @ 2.49,-249.00,1.0,-249.00,0.0,-0.14,100.0,QQQ,2/21/20,219.0,PUT
347,2020-01-24 12:28:19,Trade,BUY_TO_OPEN,QQQ   200221P00223000,Equity Option,Bought 1 QQQ 02/21/20 Put 223.00 @ 3.95,-395.00,1.0,-395.00,-1.0,-0.14,100.0,QQQ,2/21/20,223.0,PUT
299,2020-01-30 16:02:56,Trade,SELL_TO_CLOSE,QQQ   200221P00223000,Equity Option,Sold 1 QQQ 02/21/20 Put 223.00 @ 2.91,291.00,1.0,291.00,0.0,-0.15,100.0,QQQ,2/21/20,223.0,PUT
346,2020-01-24 12:28:19,Trade,SELL_TO_OPEN,QQQ   200221P00224000,Equity Option,Sold 1 QQQ 02/21/20 Put 224.00 @ 4.34,434.00,1.0,434.00,-1.0,-0.15,100.0,QQQ,2/21/20,224.0,PUT
300,2020-01-30 16:02:55,Trade,BUY_TO_CLOSE,QQQ   200221P00224000,Equity Option,Bought 1 QQQ 02/21/20 Put 224.00 @ 3.26,-326.00,1.0,-326.00,0.0,-0.14,100.0,QQQ,2/21/20,224.0,PUT
339,2020-01-27 09:56:51,Trade,SELL_TO_OPEN,QQQ   200320C00219000,Equity Option,Sold 1 QQQ 03/20/20 Call 219.00 @ 6.24,624.00,1.0,624.00,-1.0,-0.16,100.0,QQQ,3/20/20,219.0,CALL
15,2020-02-27 15:59:01,Trade,BUY_TO_CLOSE,QQQ   200320C00219000,Equity Option,Bought 1 QQQ 03/20/20 Call 219.00 @ 2.31,-231.00,1.0,-231.00,0.0,-0.14,100.0,QQQ,3/20/20,219.0,CALL
340,2020-01-27 09:56:51,Trade,BUY_TO_OPEN,QQQ   200320C00220000,Equity Option,Bought 1 QQQ 03/20/20 Call 220.00 @ 5.66,-566.00,1.0,-566.00,-1.0,-0.14,100.0,QQQ,3/20/20,220.0,CALL
14,2020-02-27 15:59:01,Trade,SELL_TO_CLOSE,QQQ   200320C00220000,Equity Option,Sold 1 QQQ 03/20/20 Call 220.00 @ 2.01,201.00,1.0,201.00,0.0,-0.15,100.0,QQQ,3/20/20,220.0,CALL

The end result would put these 12 transactions as 3 trades, grouped together by a 10 second time range when a date is found from the opening side per underlying symbol.最终结果会将这 12 笔交易作为 3 笔交易,当从每个底层代码的开盘方找到日期时,按 10 秒的时间范围组合在一起。

Edit:编辑:

A sample of the raw dataset:原始数据集的示例:

Date,Type,Action,Symbol,Instrument Type,Description,Value,Quantity,Average Price,Commissions,Fees,Multiplier,Underlying Symbol,Expiration Date,Strike Price,Call or Put
2020-02-29T10:09:05-0500,Money Movement,,,,Regulatory fee adjustment,-0.28,0.0,,,0.00,,,,,
2020-02-28T16:00:00-0500,Receive Deliver,,M     200228C00019500,Equity Option,Removal of 3 M 02/28/20 Call 19.50 due to expiration.,0.00,3.0,0.00,,0.00,100,M,2/28/20,19.5,CALL
2020-02-28T15:36:34-0500,Trade,BUY_TO_OPEN,SVXY  200619C00085000,Equity Option,Bought 1 SVXY 06/19/20 Call 85.00 @ 0.06,-6.00,1.0,-6.00,-1.00,-0.14,100,SVXY,6/19/20,85.0,CALL
2020-02-28T15:33:32-0500,Trade,BUY_TO_OPEN,SVXY  200320C00069000,Equity Option,Bought 1 SVXY 03/20/20 Call 69.00 @ 0.15,-15.00,1.0,-15.00,-1.00,-0.14,100,SVXY,3/20/20,69.0,CALL
2020-02-28T12:06:13-0500,Trade,BUY_TO_OPEN,GME   200417C00010000,Equity Option,Bought 10 GME 04/17/20 Call 10.00 @ 0.01,-10.00,10.0,-1.00,-10.00,-1.39,100,GME,4/17/20,10.0,CALL
2020-02-28T12:05:54-0500,Trade,BUY_TO_OPEN,GME   200417C00004500,Equity Option,Bought 1 GME 04/17/20 Call 4.50 @ 0.23,-23.00,1.0,-23.00,-1.00,-0.14,100,GME,4/17/20,4.5,CALL
2020-02-28T10:23:57-0500,Trade,SELL_TO_OPEN,VXX   200417C00025000,Equity Option,Sold 1 VXX 04/17/20 Call 25.00 @ 3.39,339.00,1.0,339.00,-1.00,-0.15,100,VXX,4/17/20,25.0,CALL
2020-02-28T10:23:57-0500,Trade,BUY_TO_OPEN,VXX   200417C00026000,Equity Option,Bought 1 VXX 04/17/20 Call 26.00 @ 3.02,-302.00,1.0,-302.00,-1.00,-0.14,100,VXX,4/17/20,26.0,CALL

Please try and see if I understood you right.请尝试看看我是否理解你的意思。 I have saved your data as csv and loaded the dataframe as opens .我已将您的数据保存为 csv 并将数据框加载为opens The new dataframe has the following columns;新数据框具有以下列;

['Date', 'Type', 'Action', 'Symbol', 'Instrument Type', 'Description',
       'Value', 'Quantity', 'Average Price', 'Commissions', 'Fees',
       'Multiplier', 'Underlying Symbol', 'Expiration Date', 'Strike Price',
       'Call or Put']

I convert Date column to a datetime我将日期列转换为日期时间

 opens['Date']=pd.to_datetime(opens['Date'])

I set the Date Column as the dataframe index我将日期列设置为数据框索引

opens.set_index('Date', inplace=True)

I group the dataframe by Action and Symbol, while categorizing the index in a 10 second interval.Simultaneously, I count the Types in each group within the the interval我按动作和符号对数据框进行分组,同时以 10 秒的间隔对索引进行分类。同时,我计算间隔内每组中的类型

opens.groupby(["Action","Symbol"]).resample("10S"). apply(lambda x: x['Type'].count())

Or did you want?或者你想要?

opens.groupby(["Action","Symbol"]).resample("10S"). apply(lambda x: x['Type'].count()).unstack()

What does pushing to vertical mean?From your code, vertical is a dictionary.推到垂直是什么意思?从你的代码来看,垂直是一本字典。 What are its keys and what should be the values?它的键是什么,值应该是什么?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM