I have a gigantic dataframe with a datetime type column called dt
, the data frame is sorted based on dt
already. I want to split the dataframe into several dataframes based on dt
, each dataframe contains rows within 1 hr
range.
Split
dt text
0 20160811 11:05 a
1 20160811 11:35 b
2 20160811 12:03 c
3 20160811 12:36 d
4 20160811 12:52 e
5 20160811 14:32 f
into
dt text
0 20160811 11:05 a
1 20160811 11:35 b
2 20160811 12:03 c
dt text
0 20160811 12:36 d
1 20160811 12:52 e
dt text
0 20160811 14:32 f
You need groupby
by difference of first value of column dt
converted to hour
by astype
:
S = pd.to_datetime(df.dt)
for i, g in df.groupby([(S - S[0]).astype('timedelta64[h]')]):
print (g.reset_index(drop=True))
dt text
0 20160811 11:05 a
1 20160811 11:35 b
2 20160811 12:03 c
dt text
0 20160811 12:36 d
1 20160811 12:52 e
dt text
0 20160811 14:32 f
List comprehension
solution:
S = pd.to_datetime(df.dt)
print ((S - S[0]).astype('timedelta64[h]'))
0 0.0
1 0.0
2 0.0
3 1.0
4 1.0
5 3.0
Name: dt, dtype: float64
L = [g.reset_index(drop=True) for i, g in df.groupby([(S - S[0]).astype('timedelta64[h]')])]
print (L[0])
dt text
0 20160811 11:05 a
1 20160811 11:35 b
2 20160811 12:03 c
print (L[1])
dt text
0 20160811 12:36 d
1 20160811 12:52 e
print (L[2])
dt text
0 20160811 14:32 f
Old solution, which split by hour
:
You can use groupby
by dt.hour
, but first need convert dt
to_datetime
:
for i, g in df.groupby([pd.to_datetime(df.dt).dt.hour]):
print (g.reset_index(drop=True))
dt text
0 20160811 11:05 a
1 20160811 11:35 b
dt text
0 20160811 12:03 c
1 20160811 12:36 d
2 20160811 12:52 e
dt text
0 20160811 14:32 f
List comprehension
solution:
L = [g.reset_index(drop=True) for i, g in df.groupby([pd.to_datetime(df.dt).dt.hour])]
print (L[0])
dt text
0 20160811 11:05 a
1 20160811 11:35 b
print (L[1])
dt text
0 20160811 12:03 c
1 20160811 12:36 d
2 20160811 12:52 e
print (L[2])
dt text
0 20160811 14:32 f
Or use list comprehension
with converting column dt
to datetime
:
df.dt = pd.to_datetime(df.dt)
L =[g.reset_index(drop=True) for i, g in df.groupby([df['dt'].dt.hour])]
print (L[1])
dt text
0 2016-08-11 12:03:00 c
1 2016-08-11 12:36:00 d
2 2016-08-11 12:52:00 e
print (L[2])
dt text
0 2016-08-11 14:32:00 f
If need split by date
s and hour
s:
#changed dataframe for testing
print (df)
dt text
0 20160811 11:05 a
1 20160812 11:35 b
2 20160813 12:03 c
3 20160811 12:36 d
4 20160811 12:52 e
5 20160811 14:32 f
serie = pd.to_datetime(df.dt)
for i, g in df.groupby([serie.dt.date, serie.dt.hour]):
print (g.reset_index(drop=True))
dt text
0 20160811 11:05 a
dt text
0 20160811 12:36 d
1 20160811 12:52 e
dt text
0 20160811 14:32 f
dt text
0 20160812 11:35 b
dt text
0 20160813 12:03 c
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.