[英]comparing column values in groupby in pandas
我的 dataframe 看起来像这样
Time Name price Profit
5:25 A 150 15
5:25 B 250 10
5:25 C 200 20
5:30 A 200 25
5:30 B 150 20
5:30 C 210 25
5:35 A 180 15
5:35 B 200 30
5:35 C 200 10
5:40 A 150 20
5:40 B 260 15
5:40 C 220 10
我想要 output 应该是这样的:
Time Name price profit diff_price diff_profit
5:25 A 150 15 0 0
5:25 B 250 10 0 0
5:25 C 200 20 0 0
5:30 A 200 25 50 10
5:30 B 150 20 -100 10
5:30 C 210 25 10 5
5:35 A 180 15 20 -10
5:35 B 200 30 50 10
5:35 C 200 10 -10 -15
5:40 A 150 20 -30 5
5:40 B 260 35 60 5
5:40 C 220 15 20 5
我需要比较以前的 groupby 值是否大于以前的值,比如
A、B 和 C 的差是否大于以前的值。 如果条件匹配,它必须显示名称:
从上面的时间 5:40 可以看出,B 的 diff_price 和 diff_profit 大于所有之前的时间列值
所以 output 应该打印成:B
我的代码看起来像
df.groupby(['Time','Price'])
df['diff_price']=df.groupby(['Time','Price']).price.diff().fillna(0)
df['diff_profit']=df.groupby(['Time','Price']).profit.diff().fillna(0)
那么如何在值之间进行比较以获得所需的 output 来显示是:B
IIUC,根据Name
列计算diff_price
和diff_profit
然后根据您的条件修补最后一组时间:
df[['diff_price', 'diff_profit']] = df.groupby('Name')[['price', 'profit']] \
.diff().fillna(0)
mask = df['Time'].eq(df['Time'].max())
df.loc[mask, 'diff_profit'] = df.loc[mask, 'diff_profit'].max()
输出:
>>> df
Time Name price profit diff_price diff_profit
0 5:25 A 150 15 0.0 0.0
1 5:25 B 250 10 0.0 0.0
2 5:25 C 200 20 0.0 0.0
3 5:30 A 200 25 50.0 10.0
4 5:30 B 150 20 -100.0 10.0
5 5:30 C 210 25 10.0 5.0
6 5:35 A 180 15 -20.0 -10.0
7 5:35 B 200 30 50.0 10.0
8 5:35 C 200 10 -10.0 -15.0
9 5:40 A 150 20 -30.0 5.0
10 5:40 B 260 15 60.0 5.0
11 5:40 C 220 10 20.0 5.0
您可以同时解决这个问题的一组(“名称”):
# Let's iterate the dataframe by grouping by "Name"
for name, group_df in df.groupby(["Name"]):
# Make sure that the rows are sorted by time
group_df = group_df.sort_values("Time")
# Calculate difference between each row (diff = bottom - top)
group_df[["diff_price", "diff_profit"]] = group_df[["price", "Profit"]].shift(1) - group_df[["price", "Profit"]]
# Fill the first value with 0 instead of NaN (as in your sample input)
group_df = group_df.fillna(0)
# Let's see if the maximum diff_price is reached at the end
*previous_values, last_value = group_df["diff_price"]
if last_value >= max(previous_values):
print(f"Max price diff reached at '{name}'")
print(group_df.tail(1))
# Again, but let's checkout the diff_profit
*previous_values, last_value = group_df["diff_profit"]
if last_value >= max(previous_values):
print(f"Max profit diff reached at '{name}'")
print(group_df.tail(1))
这是我为您的示例输入获得的 output:
Max price diff reached: A
Time Name price Profit diff_price diff_profit
9 5:40 A 150 20 30.0 -5.0
Max profit diff reached: B
Time Name price Profit diff_price diff_profit
10 5:40 B 260 15 -60.0 15.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.