創建一個基於現有列增加/減少的新變量列

Question

我正在嘗試構建一個 model，當風能下降時，電價基准會增加，反之亦然，當風能很大時。

我的 dataframe 看起來像這樣：

   df:

                          wind      Cost
2021-01-01 01:00:00   4281.000000  12.72250
2021-01-01 02:00:00   4384.083333  11.34000
2021-01-01 03:00:00   4405.666667  11.34000
2021-01-01 04:00:00   4514.666667   9.93300
2021-01-01 05:00:00   4692.416667   9.49200
...                           ...       ...
2021-12-31 20:00:00   9698.000000  32.87550
2021-12-31 21:00:00   9854.083333  34.38225
2021-12-31 22:00:00   9880.916667  29.61000
2021-12-31 23:00:00  10356.500000  11.76000
2022-01-01 00:00:00  10478.500000  15.75000`

我用 df.describe() 來總結數據並得到了這個：

         wind           Cost
count 8760.000000   8760.000000
mean 5588.449878    22.131348
std 3774.164710 9.547735
min 56.333333   0.042000
25% 2297.604167 13.475437
50% 4792.375000 20.160000
75% 8710.187500 33.012000
max 14132.166667    34.996500

我如何請每個小時創建一個新列，並根據系統中的風，設置價格。 因此，例如當風欄下的值落在 min-25% 百分位數時，成本上升 5. 25-50%，成本上升 2.5。 但是，當風值在 75%-max 之間時，成本降低 5，50%-75% 成本降低 2.5。

任何幫助或被指出正確的方向都會有所幫助。 非常感謝

Answer 1

我認為pd.cut會解決你的問題。

例如，

import numpy as np
import pandas as pd


df["cost_difference"] = pd.cut(
    df["wind"],
    bins=np.quantile(df["wind"], [0,0.25,0.5,0.75,1]), 
    include_lowest=True, 
    labels=[5, 2.5, -2.5, -5]).astype(float)

編輯：我忘記了pd.qcut ，它基本上將np.quantile成cut ，從而為您節省了一步：

df["cost_difference"] = pd.qcut(
    df["wind"],
    q=[0, 0.25, 0.5, 0.75, 1],
    labels=[5, 2.5, -2.5, -5],
).astype(float)

這會添加一個列"cost_difference" ，其中包含按分位數為每個值范圍作為“標簽”提供的值。

因此，對於這種情況，風范圍映射到成本差異為

   0-25%  ->  5
 25%-50%  ->  2.5
 50%-75%  -> -2.5
 75%-100% -> -5

那么你可以用df["Cost"] + df["cost_difference"]來得到你的最終成本。

EDIT2：添加強制轉換為浮動，以便添加工作。

創建一個基於現有列增加/減少的新變量列

問題描述

1 個解決方案

解決方案1
0 已采納 2022-07-26 19:43:02

創建一個基於現有列增加/減少的新變量列

問題描述

1 個解決方案

解決方案1 0 已采納 2022-07-26 19:43:02

解決方案1
0 已采納 2022-07-26 19:43:02