[英]Identifying whether a list item is in a string
我正在嘗試創建一個嵌套循環序列,它查看一系列停用詞和字符串列表,並確定每個停用詞是否在每個列表項中。 理想情況下,我希望能夠將每個字符串中存在的單詞添加到新列中,並將它們全部從字符串中刪除。
有人有提示嗎? 我的循環順序錯誤嗎?
def remove_stops(text, customStops):
"""
Removes custom stopwords.
Parameters
----------
text : the variable storing strings from which
stopwords should be removed. This can be a string
or a pandas DataFrame.
customStops : the list of stopwords which should be removed.
Returns
-------
Cleansed lists.
"""
for item in text:
print("Text:", item)
for word in customStops:
print("Custom Stops: ", word)
if word in item:
print("Word: ", word)
#Add word to list of words in item
#Remove word from item
這是您可以執行的操作:
def remove_stops(text, customStops):
found = {k:[] for k in text} # Dict for all found stopwords in text
for i,item in enumerate(text):
for word in customStops:
text[i] = text[i].replace(word,'') # Remove all stopwords from each string, if the stopword is not in, the replace will just leave it as it is
if word in item:
found[item].append(word)
return text, found
text = ['Today is my lucky day!',
'Tomorrow is rainy',
'Please help!',
'I want to fly']
customStops = ['help', 'fly']
clean, found = remove_stops(text, customStops)
print(clean)
print(found)
Output:
['Today is my lucky day!',
'Tomorrow is rainy',
'Please !',
'I want to ']
{'Today is my lucky day!': [],
'Tomorrow is rainy': [],
'Please help!': ['help'],
'I want to fly': ['fly']}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.