[英]How to use groupby() with between_time()?
I have a DataFrame and want to multiply all values in a column a
for a certain day with the value of a
at 6h00m00 of that day.我有一个 DataFrame 并且想将某一天 a 列中a
所有值与当天 6h00m00 的a
值相乘。 If there is no 6h00m00 entry, that day should stay unchanged.如果没有 6h00m00 条目,那一天应该保持不变。
The code below unfortunately gives an error.不幸的是,下面的代码给出了一个错误。
How do I have to correct this code / replace it with any working solution?我必须如何更正此代码/将其替换为任何有效的解决方案?
import pandas as pd
import numpy as np
start = pd.Timestamp('2000-01-01')
end = pd.Timestamp('2000-01-03')
t = np.linspace(start.value, end.value, 9)
datetime1 = pd.to_datetime(t)
df = pd.DataFrame( {'a':[1,3,4,5,6,7,8,9,14]})
df['date']= datetime1
print(df)
def myF(x):
y = x.set_index('date').between_time('05:59', '06:01').a
return y
toMultiplyWith = df.groupby(df.date.dt.floor('D')).transform(myF)
. .
a date
0 1 2000-01-01 00:00:00
1 3 2000-01-01 06:00:00
2 4 2000-01-01 12:00:00
3 5 2000-01-01 18:00:00
4 6 2000-01-02 00:00:00
5 7 2000-01-02 06:00:00
6 8 2000-01-02 12:00:00
7 9 2000-01-02 18:00:00
8 14 2000-01-03 00:00:00
....
AttributeError: ("'Series' object has no attribute 'set_index'", 'occurred at index a')
you should change this line:你应该改变这一行:
toMultiplyWith = df.groupby(df.date.dt.floor('D')).transform(myF)
to this:对此:
toMultiplyWith = df.groupby(df.date.dt.floor('D')).apply(myF)
using .apply
instead of .transform
will give you the desired result.使用.apply
而不是.transform
会给你想要的结果。
apply
is the right choice here since it implicitly passes all the columns for each group as a DataFrame to the custom function. apply
在这里是正确的选择,因为它隐式地将每个组的所有列作为 DataFrame 传递给自定义函数。
to read more about the difference between the two methods, consider this answer要详细了解这两种方法之间的区别,请考虑此答案
If you stick to use between_times(...)
function, that would be the way to do it:如果您坚持使用between_times(...)
函数,那将是这样做的方法:
df = df.set_index('date')
mask = df.between_time('05:59', '06:01').index
df.loc[mask, 'a'] = df.loc[mask, 'a'] ** 2 # the operation you want to perform
df.reset_index(inplace=True)
Outputs:输出:
date a
0 2000-01-01 00:00:00 1
1 2000-01-01 06:00:00 9
2 2000-01-01 12:00:00 4
3 2000-01-01 18:00:00 5
4 2000-01-02 00:00:00 6
5 2000-01-02 06:00:00 49
6 2000-01-02 12:00:00 8
7 2000-01-02 18:00:00 9
8 2000-01-03 00:00:00 14
If I got your goal right, you can use apply
to return a dataframe with the same amount of rows as the original dataframe (simulating a transform
):如果我的目标正确,您可以使用apply
返回一个与原始数据帧行数相同的数据帧(模拟transform
):
def myF(grp):
time = grp.date.dt.strftime('%T')
target_idx = time == '06:00:00'
if target_idx.any():
grp.loc[~target_idx, 'a_sum'] = grp.loc[~target_idx, 'a'].values * grp.loc[target_idx, 'a'].values
else:
grp.loc[~target_idx, 'a_sum'] = np.nan
return grp
df.groupby(df.date.dt.floor('D')).apply(myF)
Output:输出:
a date a_sum
0 1 2000-01-01 00:00:00 3.0
1 3 2000-01-01 06:00:00 NaN
2 4 2000-01-01 12:00:00 12.0
3 5 2000-01-01 18:00:00 15.0
4 6 2000-01-02 00:00:00 42.0
5 7 2000-01-02 06:00:00 NaN
6 8 2000-01-02 12:00:00 56.0
7 9 2000-01-02 18:00:00 63.0
8 14 2000-01-03 00:00:00 NaN
See that, for each day, each value with time other than 06:00:00 is multiplied by the value with time equals 06:00:00.请注意,对于每一天,除 06:00:00 以外的每个时间值乘以时间等于 06:00:00 的值。 It retuns NaN
for the 06:00:00-values themselves, as well as for the groups without this time.它为 06:00:00 值本身以及没有这个时间的组返回NaN
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.