简体   繁体   中英

How to merge DataFrame with Multi-Index

I want to create a multi-index based on Dates and Times from my DataFrame - df. This actually works and indexed returns the correct Multiindex (continuous dates from 02/10/2017 until 31/03/2018 with continuous times from 08:00:00 - 21:55:00 in 5 Min Intervals).

The problem is that my dataset doesn't have complete observations. Means, for a random day there could be missing observations for 08:05:00 until 08:40:00. My Multiindex has a length of 20496, while my DataFrame has a length of 20486.

When I want to merge the index with my DataFrame all observations are NAN . Does someone now how to merge the multi index with my initial DataFrame - df ?

df = pd.read_csv(file_name, parse_dates=[0], 
index_col=0, sep=',')
df['Date'] = df.index.date
df['Time'] = df.index.time
df['Time'] = df['Time'].astype(str)
df = df[df['Time'] != '22:00:00']

list_date = set(df['Date'])
list_time = set(df['Time'])

list_date = sorted(list_date)
list_time = sorted(list_time)

iterables = [list_date, list_time]
indexed = pd.MultiIndex.from_product(iterables, 
names=['date', 'time'])
df_new = pd.DataFrame(df, index=indexed)

df output:

                            r2     var_v2x        Date      Time
TIME1                                                            
2017-10-02 08:00:00         NaN  0.00008784  2017-10-02  08:00:00
2017-10-02 08:05:00  0.00000000  0.00008784  2017-10-02  08:05:00
2017-10-02 08:10:00  0.00000008  0.00008784  2017-10-02  08:10:00

Found a solution (probably not an elegant one).

df = pd.read_csv(file_name, parse_dates=[0], index_col=0, sep=',')
df['Date'] = df.index.date
df['Time'] = df.index.time
df['Time'] = df['Time'].astype(str)

df = df[df['Time'] != '22:00:00']

list_date = set(df['Date'])
list_time = set(df['Time'])

list_date = sorted(list_date)
list_time = sorted(list_time)

iterables = [list_date, list_time]
indexed = pd.MultiIndex.from_product(iterables, names=['date', 'time'])

df_index_date = indexed.get_level_values(0)
df_index_time = indexed.get_level_values(1)

df_joined = pd.DataFrame(df_index_date.astype(str) + ' ' + df_index_time.astype(str))
df_joined = df_joined.reset_index()
df_joined = df_joined.set_index(df_joined[0])
del df_joined['index']
del df_joined[0]

df_final = df_joined.join(df)
df_final = df_final.reset_index(drop=True)
df_final = df_final.set_index(indexed)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM