通过字典小写传递条件无法识别

Question

In this sample data:在此样本数据中：

data = [{'source': ' Off-grid energy'},
 {'source': 'off-grid generation'},
 {'source': 'Off grid energy '},
 {'source': 'OFFGRID energy'},
 {'source': 'apple sauce'},
 {'source': 'green energy'},
 {'source': 'Green electricity '},
 {'source': 'tomato  sauce'},
 {'source': 'BIOMASS as an energy source'},
 {'source': 'produced heat (biogas).'}]

I want to create a new column based on conditions:我想根据条件创建一个新列：

my_conditions = {
    "green": df["source"].str.contains("green"),
    "bio-gen": df["source"].str.contains("bio"),
    "off-grid": df["source"].str.contains("off-grid")
}

I preprocess by lowercasing df["source"]:我通过小写 df["source"] 进行预处理：

df['source'] = df["source"].str.lower()

Then using Numpy's select:然后使用 Numpy 的 select：

df['category-lower'] = np.select(my_conditions.values(),\
                           my_conditions.keys(),\
                           default="other")

I can't figure out why the lowercasing is not recognized (see row 0, 6, 8)我无法弄清楚为什么无法识别小写字母（请参阅第 0、6、8 行）

Answer 1

You've probably applied .str.lower() after the my_condition was constructed.您可能在构造 my_condition 之后应用了my_condition .str.lower() 。 Try instead:尝试改为：

import re

# apply .str.lower() here, or use flags=re.I (ignorecase in .str.contains)
# df['source'] = df["source"].str.lower() 

my_conditions = {
    "green": df["source"].str.contains("green", flags=re.I),
    "bio-gen": df["source"].str.contains("bio", flags=re.I),
    "off-grid": df["source"].str.contains("off-grid", flags=re.I),
}

df["category-lower"] = np.select(
    my_conditions.values(), my_conditions.keys(), default="other"
)

print(df)

Prints:印刷：

                        source category-lower
0              Off-grid energy       off-grid
1          off-grid generation       off-grid
2             Off grid energy           other
3               OFFGRID energy          other
4                  apple sauce          other
5                 green energy          green
6           Green electricity           green
7                tomato  sauce          other
8  BIOMASS as an energy source        bio-gen
9      produced heat (biogas).        bio-gen

通过字典小写传递条件无法识别

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-10-09 20:00:56

通过字典小写传递条件无法识别

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-10-09 20:00:56

解决方案1
0 已采纳 2022-10-09 20:00:56