如何在 pandas 中为一天中的所有时间分配固定值

Question

I have a half-hourly dataframe with two columns.我有一个半小时的 dataframe 有两列。 I would like to take all the hours of a day, then do some calculation which returns one number and assign that to all half-hours of that day.我想占用一天中的所有时间，然后进行一些计算，返回一个数字并将其分配给当天的所有半小时。 Below is an example code:下面是一个示例代码：

dates = pd.date_range("2003-01-01 08:30:00","2003-01-05",freq="30min")
data = np.transpose(np.array([np.random.rand(dates.shape[0]),np.random.rand(dates.shape[0])*100]))
data[0:50,0]=np.nan # my actual dataframe includes nan
df = pd.DataFrame(data = data,index =dates,columns=["DATA1","DATA2"])
print(df)
                        DATA1      DATA2
2003-01-01 08:30:00       NaN  79.990866
2003-01-01 09:00:00       NaN   5.461791
2003-01-01 09:30:00       NaN  68.892447
2003-01-01 10:00:00       NaN  44.823338
2003-01-01 10:30:00       NaN  57.860309
...                       ...        ...
2003-01-04 22:00:00  0.394574  31.943657
2003-01-04 22:30:00  0.140950  78.275981

Then I would like to apply the following function which returns one numbre:然后我想应用以下 function 返回一个数字：

def my_f(data1,data2):
    y = data1[data2>20]
    return np.median(y)

This function selects all data in DATA1 based on a condition (DATA2>20) then takes the median of all these data.这个 function 根据一个条件（DATA2>20）选择 DATA1 中的所有数据，然后取所有这些数据的中值。 How can I create a third column (let's say result) and assign back this fixed number (y) for all half-hours data of that day?如何创建第三列（比如说结果）并为当天的所有半小时数据分配回这个固定数字（y）？

My guess is I should use something like this:我的猜测是我应该使用这样的东西：

daily_tmp = df.resample('D').apply(my_f)
df['results'] = b.reindex(df.index,method='ffill')

If this approach is correct, how can I pass my_f with two arguments to resample.apply()?如果这种方法是正确的，我怎样才能将带有两个 arguments 的 my_f 传递给 resample.apply()？ Or is there any other way to do the similar task?或者有没有其他方法可以完成类似的任务？

Answer 1

My solution assumes that you have a fairly small dataset.我的解决方案假设您有一个相当小的数据集。 Please let me know if it is not the case.如果不是这样，请告诉我。

I would decompose your goal as follows: (1) group data by day (2) for each day, compute some complicated function (3) assign the resulted value in to half-hours.我将您的目标分解如下：（1）按天（2）每天对数据进行分组，计算一些复杂的 function（3）将结果值分配给半小时。

# specify the day for each datapoint
df['day'] = df.index.map(lambda x: x.strftime('%Y-%m-%d'))
# compute a complicated function for each day and store the result
mapping = {}
for day, data_for_the_day in df.groupby(by='day'):
    # assign to mapping[day] the result of a complicated function
    mapping[day] = np.mean(data_for_the_day[data_for_the_day['Data2'] > 20]['Data1'])

# assign the values to half-hours
df['result'] = df.index.map(lambda x: mapping.get(x.strftime('%Y-%m-%d'), np.nan) if x.strftime('%M')=='30' else np.nan)

That's not the neatest solution, but it is straight-forward, easy-to-understand, and works well on small datasets.这不是最简洁的解决方案，但它直截了当、易于理解，并且适用于小型数据集。

Answer 2

Here is a fast way to do it.这是一个快速的方法。

First, import libraries:首先，导入库：

import time
import pandas as pd
import numpy as np
import datetime as dt

Second, the code to achieve it:二、实现它的代码：

%%time
dates = pd.date_range("2003-01-01 08:30:00","2003-01-05",freq="30min")
data = np.transpose(np.array([np.random.rand(dates.shape[0]),np.random.rand(dates.shape[0])*100]))
data[0:50,0]=np.nan # my actual dataframe includes nan
df = pd.DataFrame(data = data,index =dates,columns=["DATA1","DATA2"])

#### Create an unique marker per hour

df['Date'] = df.index
df['Date'] = df['Date'].dt.strftime(date_format='%Y-%m-%d %H')

#### Then Stipulate some conditions

_condition_1 = df.Date == df.Date.shift(-1) # if full hour
_condition_2 = df.DATA2 > 20 # yours
_condition_3 = df.Date == df.Date.shift(1) # if half an hour

#### Now, report median where condition 1 and 2 are fullfilled

df['result'] = np.where(_condition_1 & _condition_2,(df.DATA1+df.DATA1.shift(-1)/2),0)

#### Fill the hours with median

df['result'] = np.where(_condition_3,df.result.shift(1),df.result)

#### Drop useless column
df = df.drop(['Date'],axis=1)

df[df.DATA2>20].tail(20)

Third: the output第三：output

output output

如何在 pandas 中为一天中的所有时间分配固定值

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-01-29 00:42:55

解决方案2
1 2021-01-29 00:56:21

如何在 pandas 中为一天中的所有时间分配固定值

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-01-29 00:42:55

解决方案2 1 2021-01-29 00:56:21

解决方案1
1 已采纳 2021-01-29 00:42:55

解决方案2
1 2021-01-29 00:56:21