![](/img/trans.png)
[英]Python : Check if string matches a substring in a list, return index of a substring
[英]Check if there is a substring that matches a string from a list
這里有點初學者問題; 我目前有一個 pandas df,其中一列包含各種不同的字符串。 我還有一些當前為空的列。 下面前幾行的示例;
Risk,Cost,Productivity,Security
"Unforeseen cost due to CCTV failures",,,
"Unexpected drop in Productivity",,,
我還創建了一組列表,如下所示;
Cost = ['Cost']
Productivity = ['Productivity']
Security = ['Security','CCTV','Camera']
基本上我想要做的是我想通過每一列 go 並檢查同一行上“風險”列中的字符串是否包含與列表中的一個字符串匹配的 substring。 理想的 output 如下:
Risk,Cost,Productivity,Security
"Unforeseen cost due to security issues",TRUE,FALSE,TRUE
"Unexpected drop in Productivity",FALSE,TRUE,FALSE
到目前為止,我已經嘗試了幾種不同的方法,例如
any(Cost in Risk for Cost in Costs)
但是,我不確定是否有辦法避免 any() function 區分大小寫,而且我不確定如何將其應用於整個專欄。我確實嘗試過
df['Cost'] = any(Cost in df['Risk'] for Cost in Costs)
但這返回了一個充滿“FALSE”的列。 任何朝着正確方向的輕推將不勝感激! 謝謝
我們可以創建一個對應於Cost
、 Security
和Productivity
列表中的每一個的正則表達式模式,然后使用str.contains
測試列Risk
的字符串中每個正則表達式模式的出現
for c in ('Cost', 'Productivity', 'Security'):
df[c] = df['Risk'].str.contains(fr"(?i)\b(?:{'|'.join(locals()[c])})\b")
Risk Cost Productivity Security
0 Unforeseen cost due to CCTV failures True False True
1 Unexpected drop in Productivity False True False
首先創建/定義一個 function:
def check():
res=[]
for x in Search:
res.append(df['Risk'].str.split(' ',expand=True).isin(x).any(1))
return pd.DataFrame(res).T
最后:
df[['Cost','Productivity','Security']]=check()
Output 的df
:
Risk Cost Productivity Security
0 Unforeseen cost due to CCTV failures False False True
1 Unexpected drop in Productivity False True False
我會將所有內容都設為小寫以獲取所有匹配項,而不考慮大小寫,然后將要檢查的句子和單詞都轉換為集合,然后檢查是否有任何匹配項:
from io import StringIO
txt = '''Risk,Cost,Productivity,Security
"Unforeseen cost due to CCTV failures",,,
"Unexpected drop in Productivity",,,'''
df = pd.read_csv(
StringIO(txt),
sep=',',
index_col=None,
header=0
)
df['Risk'] = df['Risk'].str.lower()
df.columns = [item.lower() for item in df.columns]
print(df)
key_dict = {
'cost': set([item.lower() for item in ['Cost']]),
'productivity': set([item.lower() for item in ['Productivity']]),
'security': set([item.lower() for item in ['Security','CCTV','Camera']])
}
for idx in df.index:
word_set = set(df.loc[idx, 'risk'].split())
print(word_set)
for col in key_dict:
if len(word_set & key_dict[col]) > 0:
df.loc[idx, col] = True
else:
df.loc[idx, col] = False
risk cost productivity security
0 unforeseen cost due to cctv failures True False True
1 unexpected drop in productivity False True False
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.