I have a DataFrame with timestamped data like below
Date Time Name Value
0 20210322 08:00:00 A 1
1 20210322 08:00:00 B 2
2 20210322 08:28:00 A 3
3 20210322 08:30:00 B 4
4 20210322 09:00:00 A 5
5 20210322 09:30:00 B 6
6 20210322 09:55:00 A 7
7 20210322 10:30:00 B 8
8 20210322 10:30:00 A 9
9 20210322 11:58:00 A 10
10 20210322 12:00:00 B 11
...
The time series has value every 30 seconds but every name is not specified at every timestamp
But the final goal is to merge it with an other dataFrame Name datetime
Name Datetime
A 2021-03-22 08:00:00
B 2021-03-22 09:25:15
A 2021-03-22 09:30:00
A 2021-03-22 10:30:00
...
and fill the columns +1h, +2h and so on, where +1h contains the values of the first DataFrame at timestamp DateTime +1h. If the precise data is not available at the specified timestamp, the ideal would be to take to first next available
Name Datetime +0h +1h +2h +3h ....
A 2021-03-22 08:00:00 1 5 7 9
B 2021-03-22 09:25:15 6 8 11 NA
A 2021-03-22 09:30:00 7 9 10 NA
A 2021-03-22 10:30:00 9 10 NA NA
...
Is there a pandas way to do that kind of trick?
Don't hesitate to ask for precision if I have not been clear enough. If you have suggestion for a better title for this post, please share, I don't don't know if the current one is precise enough.
Thanks in advance Irpie
I think you need merge_asof
first:
df1['Datetime'] = pd.to_datetime(df1['Datetime'])
#convert to datetimes
df['Datetime1'] = pd.to_datetime(df['Date'].astype(str) + ' ' + df['Time'])
df = pd.merge_asof(df, df1, left_on='Datetime1',right_on='Datetime', by='Name')
print (df)
Date Time Name Value Datetime1 Datetime
0 20210322 08:00:00 A 1 2021-03-22 08:00:00 2021-03-22 08:00:00
1 20210322 08:00:00 B 2 2021-03-22 08:00:00 NaT
2 20210322 08:28:00 A 3 2021-03-22 08:28:00 2021-03-22 08:00:00
3 20210322 08:30:00 B 4 2021-03-22 08:30:00 NaT
4 20210322 09:00:00 A 5 2021-03-22 09:00:00 2021-03-22 08:00:00
5 20210322 09:30:00 B 6 2021-03-22 09:30:00 2021-03-22 09:25:15
6 20210322 09:55:00 A 7 2021-03-22 09:55:00 2021-03-22 09:30:00
7 20210322 10:30:00 B 8 2021-03-22 10:30:00 2021-03-22 09:25:15
8 20210322 10:30:00 A 9 2021-03-22 10:30:00 2021-03-22 10:30:00
9 20210322 11:58:00 A 10 2021-03-22 11:58:00 2021-03-22 10:30:00
10 20210322 12:00:00 B 11 2021-03-22 12:00:00 2021-03-22 09:25:15
# #subtract hours and floor, convert to integers
df['hour'] = (df['Datetime1'].sub(df['Datetime']).dt.floor('H')
.dt.total_seconds()
.div(3600)
# .astype(int)
)
#pivoting with sum, rename columns names and cumualtive sum per rows
f = lambda x: f'+{x}h'
df = (df.pivot_table(index=['Name','Datetime'],
columns='hour',
values='Value',
aggfunc='sum',
fill_value=0)
.rename(columns=f))
print (df)
hour +0.0h +1.0h +2.0h
Name Datetime
A 2021-03-22 08:00:00 4 5 0
2021-03-22 09:30:00 7 0 0
2021-03-22 10:30:00 9 10 0
B 2021-03-22 09:25:15 6 8 11
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.