[英]pandas - how to organised dataframe based on date and assign new values to column
I have a dataframe of a month excluding Saturday and Sunday, which was logged every 1 minute. 我有一个月的数据帧,不包括星期六和星期日,每1分钟记录一次。
v1 v2
2017-04-03 09:15:00 35.7 35.4
2017-04-03 09:16:00 28.7 28.5
... ... ...
2017-04-03 16:29:00 81.7 81.5
2017-04-03 16:30:00 82.7 82.6
... ... ...
2017-04-04 09:15:00 24.3 24.2
2017-04-04 09:16:00 25.6 25.5
... ... ...
2017-04-04 16:29:00 67.0 67.2
2017-04-04 16:30:00 70.2 70.6
... ... ...
2017-04-28 09:15:00 31.7 31.4
2017-04-28 09:16:00 31.5 31.0
... ... ...
2017-04-28 16:29:00 33.2 33.5
2017-04-28 16:30:00 33.0 30.7
I have resample dataframe to get 1st and last value from each day. 我有重新采样数据帧,以获取每天的第一个和最后一个值。
res = df.groupby(df.index.date).apply(lambda x: x.iloc[[0, -1]])
res.index = res.index.droplevel(0)
print(res)
v1 v2
2017-04-03 09:15:00 35.7 35.4
2017-04-03 16:30:00 82.7 82.6
2017-04-04 09:15:00 24.3 24.2
2017-04-04 16:30:00 70.2 70.6
... .. ..
2017-04-28 09:15:00 31.7 31.4
2017-04-28 16:30:00 33.0 30.7
Now i want to have the data-frame organised as date with v1 of minimum timestamp and v2 of max timestamp of specific date. 现在我希望将数据框组织为日期,其中v1为最小时间戳,v2为特定日期的最大时间戳。
Desired output: 期望的输出:
v1 v2
2017-04-03 35.7 82.6
2017-04-04 24.3 70.6
... .. ..
2017-04-28 31.7 30.7
Try this: 尝试这个:
df_result = pd.DataFrame()
df_result['v1'] = res.groupby(res.index)['v1'].min()
df_result['v2'] = res.groupby(res.index)['v2'].max()
You can groupby on index and use groupby.agg
with a custom function. 您可以对索引进行groupby.agg
,并将groupby.agg
与自定义函数一起使用。
df1 = res.groupby(res.index.date).agg({'v1': lambda x: x[min(x.index)], 'v2':lambda x: x[max(x.index)]})
print (df1)
v1 v2
2017-04-03 35.7 82.6
2017-04-04 24.3 70.6
2017-04-28 31.7 33.7
An alternative to resample dataframe to get 1st and last value from each day. 重新采样数据帧的替代方法,以获取每天的第一个和最后一个值。
res=df.reset_index().groupby(df.index.date).agg(['first','last']).stack().set_index('index')
Out[123]:
v1 v2
index
2017-04-03 09:15:00 35.7 35.4
2017-04-03 16:30:00 82.7 82.6
2017-04-04 09:15:00 24.3 24.2
2017-04-04 16:30:00 70.2 70.6
2017-04-28 09:15:00 31.7 31.4
2017-04-28 16:30:00 33.0 33.7
You can reset_index
and then GroupBy
+ apply
with a custom function: 您可以使用reset_index
然后使用自定义函数GroupBy
+ apply
:
def first_second(x):
return pd.Series({'v1': x['v1'].iat[0], 'v2': x['v2'].iat[-1]})
res2 = res.reset_index()
res2 = res2.groupby(res2['index'].dt.date).apply(first_second)
print(res2)
v1 v2
index
2017-04-03 35.7 82.6
2017-04-04 24.3 70.6
2017-04-28 31.7 33.7
There is a very interesting fonction in pandas to work with the datetime index. 在pandas中有一个非常有趣的功能来处理日期时间索引。 It is the resampling fonction. 这是重新取样的功能。 In your Case try this : 在你的案例中试试这个:
def first_last(entry):
return entry['v1'][0],entry['v2'][1]
yourdataframe.resample('D').apply(first_last)
the 'D' stands for Daily resampling. 'D'代表每日重新采样。
result : 结果:
Dates
2017-04-03 35.7 82.6
2017-04-04 24.3 70.6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.