简体   繁体   English

如何将 groupby() 与 between_time() 一起使用?

[英]How to use groupby() with between_time()?

I have a DataFrame and want to multiply all values in a column a for a certain day with the value of a at 6h00m00 of that day.我有一个 DataFrame 并且想将某一天 a 列中a所有值与当天 6h00m00 的a值相乘。 If there is no 6h00m00 entry, that day should stay unchanged.如果没有 6h00m00 条目,那一天应该保持不变。

The code below unfortunately gives an error.不幸的是,下面的代码给出了一个错误。

How do I have to correct this code / replace it with any working solution?我必须如何更正此代码/将其替换为任何有效的解决方案?

import pandas as pd
import numpy as np

start = pd.Timestamp('2000-01-01')
end = pd.Timestamp('2000-01-03')
t = np.linspace(start.value, end.value, 9)
datetime1 = pd.to_datetime(t)
df = pd.DataFrame( {'a':[1,3,4,5,6,7,8,9,14]})
df['date']= datetime1
print(df)

def myF(x):
    y = x.set_index('date').between_time('05:59', '06:01').a
    return y


toMultiplyWith =  df.groupby(df.date.dt.floor('D')).transform(myF)

. .

    a                date
0   1 2000-01-01 00:00:00
1   3 2000-01-01 06:00:00
2   4 2000-01-01 12:00:00
3   5 2000-01-01 18:00:00
4   6 2000-01-02 00:00:00
5   7 2000-01-02 06:00:00
6   8 2000-01-02 12:00:00
7   9 2000-01-02 18:00:00
8  14 2000-01-03 00:00:00
....
AttributeError: ("'Series' object has no attribute 'set_index'", 'occurred at index a')

you should change this line:你应该改变这一行:

toMultiplyWith = df.groupby(df.date.dt.floor('D')).transform(myF)

to this:对此:

toMultiplyWith = df.groupby(df.date.dt.floor('D')).apply(myF)

using .apply instead of .transform will give you the desired result.使用.apply而不是.transform会给你想要的结果。

apply is the right choice here since it implicitly passes all the columns for each group as a DataFrame to the custom function. apply在这里是正确的选择,因为它隐式地将每个组的所有列作为 DataFrame 传递给自定义函数。

to read more about the difference between the two methods, consider this answer要详细了解这两种方法之间的区别,请考虑此答案

If you stick to use between_times(...) function, that would be the way to do it:如果您坚持使用between_times(...)函数,那将是这样做的方法:

df = df.set_index('date') 
mask = df.between_time('05:59', '06:01').index
df.loc[mask, 'a'] = df.loc[mask, 'a'] ** 2 # the operation you want to perform
df.reset_index(inplace=True)

Outputs:输出:

                 date   a
0 2000-01-01 00:00:00   1
1 2000-01-01 06:00:00   9
2 2000-01-01 12:00:00   4
3 2000-01-01 18:00:00   5
4 2000-01-02 00:00:00   6
5 2000-01-02 06:00:00  49
6 2000-01-02 12:00:00   8
7 2000-01-02 18:00:00   9
8 2000-01-03 00:00:00  14

If I got your goal right, you can use apply to return a dataframe with the same amount of rows as the original dataframe (simulating a transform ):如果我的目标正确,您可以使用apply返回一个与原始数据帧行数相同的数据帧(模拟transform ):

def myF(grp):
    time = grp.date.dt.strftime('%T')
    target_idx = time == '06:00:00'
    if target_idx.any():
        grp.loc[~target_idx, 'a_sum'] = grp.loc[~target_idx, 'a'].values * grp.loc[target_idx, 'a'].values
    else:
        grp.loc[~target_idx, 'a_sum'] = np.nan
    return grp

df.groupby(df.date.dt.floor('D')).apply(myF)

Output:输出:

    a                date  a_sum
0   1 2000-01-01 00:00:00    3.0
1   3 2000-01-01 06:00:00    NaN
2   4 2000-01-01 12:00:00   12.0
3   5 2000-01-01 18:00:00   15.0
4   6 2000-01-02 00:00:00   42.0
5   7 2000-01-02 06:00:00    NaN
6   8 2000-01-02 12:00:00   56.0
7   9 2000-01-02 18:00:00   63.0
8  14 2000-01-03 00:00:00    NaN

See that, for each day, each value with time other than 06:00:00 is multiplied by the value with time equals 06:00:00.请注意,对于每一天,除 06:00:00 以外的每个时间值乘以时间等于 06:00:00 的值。 It retuns NaN for the 06:00:00-values themselves, as well as for the groups without this time.它为 06:00:00 值本身以及没有这个时间的组返回NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM