簡體   English   中英

比較 pandas 中 groupby 中的列值

[英]comparing column values in groupby in pandas

我的 dataframe 看起來像這樣

Time    Name    price   Profit   
5:25    A        150       15
5:25    B        250       10
5:25    C        200       20
5:30    A        200       25
5:30    B        150       20
5:30    C        210       25
5:35    A        180       15
5:35    B        200       30
5:35    C        200       10 
5:40    A        150       20 
5:40    B        260       15 
5:40    C        220       10   

我想要 output 應該是這樣的:

Time    Name    price  profit    diff_price   diff_profit      
5:25    A        150     15         0            0
5:25    B        250     10         0            0
5:25    C        200     20         0            0
5:30    A        200     25        50            10
5:30    B        150     20        -100          10
5:30    C        210     25         10            5
5:35    A        180     15         20            -10
5:35    B        200     30         50            10
5:35    C        200     10         -10           -15
5:40    A        150     20         -30           5
5:40    B        260     35          60           5
5:40    C        220     15          20           5

我需要比較以前的 groupby 值是否大於以前的值,比如

A、B 和 C 的差是否大於以前的值。 如果條件匹配,它必須顯示名稱:

從上面的時間 5:40 可以看出,B 的 diff_price 和 diff_profit 大於所有之前的時間列值

所以 output 應該打印成:B

我的代碼看起來像

df.groupby(['Time','Price'])
df['diff_price']=df.groupby(['Time','Price']).price.diff().fillna(0)
df['diff_profit']=df.groupby(['Time','Price']).profit.diff().fillna(0)

那么如何在值之間進行比較以獲得所需的 output 來顯示是:B

IIUC,根據Name列計算diff_pricediff_profit然后根據您的條件修補最后一組時間:

df[['diff_price', 'diff_profit']] = df.groupby('Name')[['price', 'profit']] \
                                      .diff().fillna(0)

mask = df['Time'].eq(df['Time'].max())
df.loc[mask, 'diff_profit'] = df.loc[mask, 'diff_profit'].max()

輸出:

>>> df
    Time Name  price  profit  diff_price  diff_profit
0   5:25    A    150      15         0.0          0.0
1   5:25    B    250      10         0.0          0.0
2   5:25    C    200      20         0.0          0.0
3   5:30    A    200      25        50.0         10.0
4   5:30    B    150      20      -100.0         10.0
5   5:30    C    210      25        10.0          5.0
6   5:35    A    180      15       -20.0        -10.0
7   5:35    B    200      30        50.0         10.0
8   5:35    C    200      10       -10.0        -15.0
9   5:40    A    150      20       -30.0          5.0
10  5:40    B    260      15        60.0          5.0
11  5:40    C    220      10        20.0          5.0

您可以同時解決這個問題的一組(“名稱”):

# Let's iterate the dataframe by grouping by "Name"
for name, group_df in df.groupby(["Name"]):
    # Make sure that the rows are sorted by time
    group_df = group_df.sort_values("Time")
    # Calculate difference between each row (diff = bottom - top)
    group_df[["diff_price", "diff_profit"]] = group_df[["price", "Profit"]].shift(1) - group_df[["price", "Profit"]]
    # Fill the first value with 0 instead of NaN (as in your sample input)
    group_df = group_df.fillna(0)

    # Let's see if the maximum diff_price is reached at the end
    *previous_values, last_value = group_df["diff_price"]
    if last_value >= max(previous_values):
        print(f"Max price diff reached at '{name}'")
        print(group_df.tail(1))

    # Again, but let's checkout the diff_profit
    *previous_values, last_value = group_df["diff_profit"]
    if last_value >= max(previous_values):
        print(f"Max profit diff reached at '{name}'")
        print(group_df.tail(1))

這是我為您的示例輸入獲得的 output:

Max price diff reached: A
   Time Name  price  Profit  diff_price  diff_profit
9  5:40    A    150      20        30.0         -5.0
Max profit diff reached: B
    Time Name  price  Profit  diff_price  diff_profit
10  5:40    B    260      15       -60.0         15.0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM