On an iterative basis, I'm generating a DataFrame that looks like this:
RIC RICRoot ISIN ExpirationDate Exchange ... OpenInterest BlockVolume TotalVolume2 SecurityDescription SecurityLongDescription
closingDate ...
2018-03-15 SPH0 SP 2020-03-20 CME:Index and Options Market ... NaN None None SP500 IDX MAR0 None
2018-03-16 SPH0 SP 2020-03-20 CME:Index and Options Market ... NaN None None SP500 IDX MAR0 None
2018-03-19 SPH0 SP 2020-03-20 CME:Index and Options Market ... NaN None None SP500 IDX MAR0 None
2018-03-20 SPH0 SP 2020-03-20 CME:Index and Options Market ... NaN None None SP500 IDX MAR0 None
2018-03-21 SPH0 SP 2020-03-20 CME:Index and Options Market ... NaN None None SP500 IDX MAR0 None
I turn this into a multi-indexed DF:
tmp.columns = pd.MultiIndex.from_arrays( [ [contract]*len(tmp.columns), tmp.columns.tolist() ] )
Where contract
is just the reference name for that data, which you can see in the output below as SPH0
:
SPH0 ...
RIC RICRoot ISIN ExpirationDate Exchange ... OpenInterest BlockVolume TotalVolume2 SecurityDescription SecurityLongDescription
closingDate ...
2018-03-15 SPH0 SP 2020-03-20 CME:Index and Options Market ... NaN None None SP500 IDX MAR0 None
2018-03-16 SPH0 SP 2020-03-20 CME:Index and Options Market ... NaN None None SP500 IDX MAR0 None
2018-03-19 SPH0 SP 2020-03-20 CME:Index and Options Market ... NaN None None SP500 IDX MAR0 None
2018-03-20 SPH0 SP 2020-03-20 CME:Index and Options Market ... NaN None None SP500 IDX MAR0 None
2018-03-21 SPH0 SP 2020-03-20 CME:Index and Options Market ... NaN None None SP500 IDX MAR0 None
I currently have a very inefficient way of merging these DataFrames:
if df is None:
df = tmp;
else:
df = df.merge( tmp, how='outer', left_index=True, right_index=True)
This is incredibly slow. I want to store all of these tempdf's in an associated mapping style with their respective contract name, and be able to reference their data easily and in a vectorized manner. What is the optimal solution? Does growing horizontally/vertically matter?
IIUC, you can just use pd.concat()
and pass your list of dataframes and the keys for you resulting MultiIndex dataframe. Take the following dataframe samples:
import pandas as pd
df1 = pd.DataFrame([
['2018-03-11', 'SPH0', 'SP', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-12', 'SPH0', 'SP', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-15', 'SPH0', 'SP', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-23', 'SPH0', 'SP', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-24', 'SPH0', 'SP', '2020-03-20', 'CME:Index and Options Market']],
columns=['closingDate', 'RIC', 'RICRoot', 'ExpirationDate', 'Exchange'])
df2 = pd.DataFrame([
['2018-03-15', 'HAB3', 'HA', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-16', 'HAB3', 'HA', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-22', 'HAB3', 'HA', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-24', 'HAB3', 'HA', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-20', 'HAB3', 'HA', '2020-03-20', 'CME:Index and Options Market']],
columns=['closingDate', 'RIC', 'RICRoot', 'ExpirationDate', 'Exchange'])
df3 = pd.DataFrame([
['2018-03-15', 'UHA6', 'UH', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-16', 'UHA6', 'UH', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-18', 'UHA6', 'UH', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-20', 'UHA6', 'UH', '2020-03-20', 'CME:Index and Options Market'],
['2018-03-21', 'UHA6', 'UH', '2020-03-20', 'CME:Index and Options Market']],
columns=['closingDate', 'RIC', 'RICRoot', 'ExpirationDate', 'Exchange'])
Now call pd.concat()
:
pd.concat([df1, df2, df3], keys=['SPH0','HAB3','UHA6'])
Yields:
closingDate ... Exchange
SPH0 0 2018-03-11 ... CME:Index and Options Market
1 2018-03-12 ... CME:Index and Options Market
2 2018-03-15 ... CME:Index and Options Market
3 2018-03-23 ... CME:Index and Options Market
4 2018-03-24 ... CME:Index and Options Market
HAB3 0 2018-03-15 ... CME:Index and Options Market
1 2018-03-16 ... CME:Index and Options Market
2 2018-03-22 ... CME:Index and Options Market
3 2018-03-24 ... CME:Index and Options Market
4 2018-03-20 ... CME:Index and Options Market
UHA6 0 2018-03-15 ... CME:Index and Options Market
1 2018-03-16 ... CME:Index and Options Market
2 2018-03-18 ... CME:Index and Options Market
3 2018-03-20 ... CME:Index and Options Market
4 2018-03-21 ... CME:Index and Options Market
You can also use a list comprehension to create a list of dataframes to pass to pd.concat()
, for example:
my_keys = ['SPH0','HAB3','UHA6']
dfs = [create_df(key) for key in my_keys]
pd.concat(dfs, keys=my_keys)
Where the function create_df()
returns a dataframe.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.