简体   繁体   English

pandas - 如何根据日期组织数据框并为列分配新值

[英]pandas - how to organised dataframe based on date and assign new values to column

I have a dataframe of a month excluding Saturday and Sunday, which was logged every 1 minute. 我有一个月的数据帧,不包括星期六和星期日,每1分钟记录一次。

                            v1         v2  
2017-04-03 09:15:00     35.7       35.4  
2017-04-03 09:16:00     28.7       28.5
      ...               ...        ...
2017-04-03 16:29:00     81.7       81.5
2017-04-03 16:30:00     82.7       82.6
      ...               ...        ...
2017-04-04 09:15:00     24.3       24.2  
2017-04-04 09:16:00     25.6       25.5
      ...               ...        ...
2017-04-04 16:29:00     67.0       67.2
2017-04-04 16:30:00     70.2       70.6
      ...               ...        ...
2017-04-28 09:15:00     31.7       31.4  
2017-04-28 09:16:00     31.5       31.0
      ...               ...        ...
2017-04-28 16:29:00     33.2       33.5
2017-04-28 16:30:00     33.0       30.7

I have resample dataframe to get 1st and last value from each day. 我有重新采样数据帧,以获取每天的第一个和最后一个值。

res = df.groupby(df.index.date).apply(lambda x: x.iloc[[0, -1]])
res.index = res.index.droplevel(0)
print(res)
                      v1    v2
2017-04-03 09:15:00  35.7  35.4
2017-04-03 16:30:00  82.7  82.6
2017-04-04 09:15:00  24.3  24.2
2017-04-04 16:30:00  70.2  70.6
   ...                ..    ..
2017-04-28 09:15:00  31.7  31.4
2017-04-28 16:30:00  33.0  30.7

Now i want to have the data-frame organised as date with v1 of minimum timestamp and v2 of max timestamp of specific date. 现在我希望将数据框组织为日期,其中v1为最小时间戳,v2为特定日期的最大时间戳。

Desired output: 期望的输出:

              v1    v2
2017-04-03  35.7  82.6
2017-04-04  24.3  70.6
   ...       ..    ..
2017-04-28  31.7  30.7

Try this: 尝试这个:

df_result = pd.DataFrame()
df_result['v1'] = res.groupby(res.index)['v1'].min()
df_result['v2'] = res.groupby(res.index)['v2'].max()

You can groupby on index and use groupby.agg with a custom function. 您可以对索引进行groupby.agg ,并将groupby.agg与自定义函数一起使用。

df1 = res.groupby(res.index.date).agg({'v1': lambda x: x[min(x.index)], 'v2':lambda x: x[max(x.index)]})

print (df1)

             v1      v2
2017-04-03  35.7    82.6
2017-04-04  24.3    70.6
2017-04-28  31.7    33.7

An alternative to resample dataframe to get 1st and last value from each day. 重新采样数据帧的替代方法,以获取每天的第一个和最后一个值。

res=df.reset_index().groupby(df.index.date).agg(['first','last']).stack().set_index('index')

Out[123]:

                      v1     v2
index       
2017-04-03 09:15:00  35.7   35.4
2017-04-03 16:30:00  82.7   82.6
2017-04-04 09:15:00  24.3   24.2
2017-04-04 16:30:00  70.2   70.6
2017-04-28 09:15:00  31.7   31.4
2017-04-28 16:30:00  33.0   33.7

You can reset_index and then GroupBy + apply with a custom function: 您可以使用reset_index然后使用自定义函数GroupBy + apply

def first_second(x):
    return pd.Series({'v1': x['v1'].iat[0], 'v2': x['v2'].iat[-1]})

res2 = res.reset_index()
res2 = res2.groupby(res2['index'].dt.date).apply(first_second)

print(res2)

              v1    v2
index                 
2017-04-03  35.7  82.6
2017-04-04  24.3  70.6
2017-04-28  31.7  33.7

There is a very interesting fonction in pandas to work with the datetime index. 在pandas中有一个非常有趣的功能来处理日期时间索引。 It is the resampling fonction. 这是重新取样的功能。 In your Case try this : 在你的案例中试试这个:

def first_last(entry):
   return entry['v1'][0],entry['v2'][1]

yourdataframe.resample('D').apply(first_last)

the 'D' stands for Daily resampling. 'D'代表每日重新采样。

result : 结果:

Dates                 
2017-04-03  35.7  82.6
2017-04-04  24.3  70.6

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据 groupby 组织的数据向 Pandas 数据框添加新列 - Adding a new column to a pandas dataframe based on data organised by groupby 如何根据 Pandas dataframe 中的日期值和条件创建新列 - How to create a new column based on Date Values & Condition in Pandas dataframe Pandas,如何避免使用 iterrow(如何根据来自另一个数据帧的值将值分配给 dataframe 中的新列) - Pandas, how can I avoid the use of iterrow (how to assign values to a new column in a dataframe based on the values from another dataframe) 根据间隔将值分配给 pandas dataframe 列 - Assign values to a pandas dataframe column based on intervals 如何根据 pandas 中的字符串值列表分配新列 - How to assign new column based on the list of string values in pandas 如何将新列分配给 Pandas 中的现有 DataFrame - How to assign new column to existing DataFrame in pandas 如何检查 date1 是否小于 date2 并在 pandas dataframe 的新列中赋值 - how to check if date1 is smaller date2 and assign value in new column in pandas dataframe 根据日期,使用来自另一个 dataframe 的值在 pandas dataframe 中创建一个新列 - Create a new column in pandas dataframe with values from another dataframe, based on date 如何使用pandas.DataFrame.assign()根据不同的数据框添加新列 - How to use pandas.DataFrame.assign() to add new column based on a different dataframe 在 pandas dataframe 中,如何根据列值过滤行,进行计算并将结果分配给新列? - In a pandas dataframe, how can I filter the rows based on a column value, do calculation and assign the result to a new column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM