根据条件过滤数据

Question

我有这样的数据

我想将数据分段/过滤到 A、B、C 存储桶中，基于累积百分比列和条件 0 - ~ 80% 的累积百分比需要在桶 A 中，~80 - ~95% 在桶 B 和 ~95% - 100% 在桶 C。 我面临的问题是我不想在创建这个过滤器时将材料分成两个不同的桶，我想过滤最接近的百分比（无论大于还是小于 80% 都没有关系）。

例如，

如果我添加一个过滤器


if cumulative_percent <= 79.4086332486936 :
    return 'A'

这将在桶 A 中添加材料编号 901573047 和 913119，但材料 913119 的 rest 的累积百分比可能为 80.00023232

Answer 1

我建议使用一些可重现的代码而不是发布图像，因为它们不是很有帮助（见这里）。

您可以根据使用 pandas 中的loc的条件过滤 DataFrame。

所以假设你的 DataFrame 被称为df

df = df.loc[df['cumulative_percent'] < 80, 'Bucket'] = 'A'
df = df.loc[(df['cumulative_percent'] >= 80) & (df['cumulative_percent'] < 95), 'Bucket'] = 'B'
df = df.loc[df['cumulative_percent'] >= 95, 'Bucket'] = 'C'

从那里您现在为每个新分类的行都有一个新列。 如果您说有些行是cumulative_percent百分比的值不正确或与其他Material No不匹配，那么您需要创建一个规则来说明如何处理不正确的值。

如果您只想对每种Material No说不使用最小cumulative_percent ，您可以这样做：

df['new_cumulative_percent'] = df.groupby('Planning_Material')['cumulative_percent'].transform('min')

然后只过滤新的“更正”的cumulative_percent值

Answer 2

您可以使用pd.cut创建类别：

df['bucket'] = pd.cut(df['cumulative_percent'], [0, 80, 95, 100], labels=['A', 'B', 'C'])
print(df)

# Output
     cumulative_percent bucket
0              0.477769      A
1              1.019964      A
2              1.582226      A
3              1.808495      A
4              2.450631      A
..                  ...    ...
195          100.000000      C
196          100.000000      C
197          100.000000      C
198          100.000000      C
199          100.000000      C

[200 rows x 2 columns]

根据条件过滤数据

问题描述

2 个解决方案

解决方案1
1 2022-07-26 20:41:16

解决方案2
1 2022-07-26 20:47:24

根据条件过滤数据

问题描述

2 个解决方案

解决方案1 1 2022-07-26 20:41:16

解决方案2 1 2022-07-26 20:47:24

解决方案1
1 2022-07-26 20:41:16

解决方案2
1 2022-07-26 20:47:24