[英]passing conditions through dictionary lowercase not recognized
In this sample data:在此样本数据中:
data = [{'source': ' Off-grid energy'},
{'source': 'off-grid generation'},
{'source': 'Off grid energy '},
{'source': 'OFFGRID energy'},
{'source': 'apple sauce'},
{'source': 'green energy'},
{'source': 'Green electricity '},
{'source': 'tomato sauce'},
{'source': 'BIOMASS as an energy source'},
{'source': 'produced heat (biogas).'}]
I want to create a new column based on conditions:我想根据条件创建一个新列:
my_conditions = {
"green": df["source"].str.contains("green"),
"bio-gen": df["source"].str.contains("bio"),
"off-grid": df["source"].str.contains("off-grid")
}
I preprocess by lowercasing df["source"]:我通过小写 df["source"] 进行预处理:
df['source'] = df["source"].str.lower()
Then using Numpy's select:然后使用 Numpy 的 select:
df['category-lower'] = np.select(my_conditions.values(),\
my_conditions.keys(),\
default="other")
I can't figure out why the lowercasing is not recognized (see row 0, 6, 8)我无法弄清楚为什么无法识别小写字母(请参阅第 0、6、8 行)
You've probably applied .str.lower()
after the my_condition
was constructed.您可能在构造 my_condition 之后应用了my_condition
.str.lower()
。 Try instead:尝试改为:
import re
# apply .str.lower() here, or use flags=re.I (ignorecase in .str.contains)
# df['source'] = df["source"].str.lower()
my_conditions = {
"green": df["source"].str.contains("green", flags=re.I),
"bio-gen": df["source"].str.contains("bio", flags=re.I),
"off-grid": df["source"].str.contains("off-grid", flags=re.I),
}
df["category-lower"] = np.select(
my_conditions.values(), my_conditions.keys(), default="other"
)
print(df)
Prints:印刷:
source category-lower
0 Off-grid energy off-grid
1 off-grid generation off-grid
2 Off grid energy other
3 OFFGRID energy other
4 apple sauce other
5 green energy green
6 Green electricity green
7 tomato sauce other
8 BIOMASS as an energy source bio-gen
9 produced heat (biogas). bio-gen
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.