简体   繁体   English

从数据框的列中过滤掉关键字(不区分大小写) - Pandas

[英]Filtering out keywords(case insensitive) from a column of a dataframe - Pandas

Want to create a new column based on a word in an existing column The new column should either Lobby,UPS,Electrical or ' Blank Space '想要根据现有列中的单词创建新列 新列应该是 Lobby、UPS、Electrical 或“空白空间

Name                                SubUnitName
Lobby Area                          Lobby
Sensor - Bank lobby                 Lobby
Temperature - UPS Room              UPS
Sensor - Electric Room              Electric
Sensor - electrical Room            Electric
Temperature - electric Room         Electric
Sensor

As Seen above the search should be case insensitive and if 'Electrical' or 'Electric' is found then the result should be 'Electric'如上所示,搜索应不区分大小写,如果找到“Electrical”或“Electric”,则结果应为“Electric”

Establishes the list of words to look for in the "Name" column, then applies the function "find_match" in order to create the new "SubUnitName" column.建立要在“名称”列中查找的单词列表,然后应用函数“find_match”以创建新的“SubUnitName”列。

search_list = ["Lobby", "UPS", "Electric"]


def find_match(name_str: str) -> str:
    for item in search_list:
        item_lc = item.lower()
        if item_lc in name_str.lower():
            return item


df.loc[:, "SubUnitName"] = df["Name"].apply(find_match)

Replace None with blank space for last row用最后一行的空格替换 None

df["SubUnitName"].fillna('', inplace=True)

I provided a solution for you.我为您提供了解决方案。 It checks for a match between the strings and if found, adds it to a list which will be your new column.它检查字符串之间是否匹配,如果找到,则将其添加到一个列表中,该列表将成为您的新列。

import pandas as pd 


d = { "Name" : ["Sensor - Bank lobby ", "Sensor - Bank Lobby ", "Temperature - UPS Room", "Sensor - Electric Room ", "Sensor - electrical Room", "Sensor"]}


df = pd.DataFrame(data=d)

list_sub_units = []

list_matches = ["Lobby", "UPS", "Electric"]

for entry in df["Name"]:
    matched = False

    for match in list_matches:
        if entry.lower().find(match.lower()) > 0:
            list_sub_units.append(match)
            matched = True
        
    if not matched:
        list_sub_units.append("")

df["SubUnitName"] = list_sub_units


print(df)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM