根據條件過濾數據

Question

我有這樣的數據

我想將數據分段/過濾到 A、B、C 存儲桶中，基於累積百分比列和條件 0 - ~ 80% 的累積百分比需要在桶 A 中，~80 - ~95% 在桶 B 和 ~95% - 100% 在桶 C。 我面臨的問題是我不想在創建這個過濾器時將材料分成兩個不同的桶，我想過濾最接近的百分比（無論大於還是小於 80% 都沒有關系）。

例如，

如果我添加一個過濾器


if cumulative_percent <= 79.4086332486936 :
    return 'A'

這將在桶 A 中添加材料編號 901573047 和 913119，但材料 913119 的 rest 的累積百分比可能為 80.00023232

Answer 1

我建議使用一些可重現的代碼而不是發布圖像，因為它們不是很有幫助（見這里）。

您可以根據使用 pandas 中的loc的條件過濾 DataFrame。

所以假設你的 DataFrame 被稱為df

df = df.loc[df['cumulative_percent'] < 80, 'Bucket'] = 'A'
df = df.loc[(df['cumulative_percent'] >= 80) & (df['cumulative_percent'] < 95), 'Bucket'] = 'B'
df = df.loc[df['cumulative_percent'] >= 95, 'Bucket'] = 'C'

從那里您現在為每個新分類的行都有一個新列。 如果您說有些行是cumulative_percent百分比的值不正確或與其他Material No不匹配，那么您需要創建一個規則來說明如何處理不正確的值。

如果您只想對每種Material No說不使用最小cumulative_percent ，您可以這樣做：

df['new_cumulative_percent'] = df.groupby('Planning_Material')['cumulative_percent'].transform('min')

然后只過濾新的“更正”的cumulative_percent值

Answer 2

您可以使用pd.cut創建類別：

df['bucket'] = pd.cut(df['cumulative_percent'], [0, 80, 95, 100], labels=['A', 'B', 'C'])
print(df)

# Output
     cumulative_percent bucket
0              0.477769      A
1              1.019964      A
2              1.582226      A
3              1.808495      A
4              2.450631      A
..                  ...    ...
195          100.000000      C
196          100.000000      C
197          100.000000      C
198          100.000000      C
199          100.000000      C

[200 rows x 2 columns]

根據條件過濾數據

問題描述

2 個解決方案

解決方案1
1 2022-07-26 20:41:16

解決方案2
1 2022-07-26 20:47:24

根據條件過濾數據

問題描述

2 個解決方案

解決方案1 1 2022-07-26 20:41:16

解決方案2 1 2022-07-26 20:47:24

解決方案1
1 2022-07-26 20:41:16

解決方案2
1 2022-07-26 20:47:24