[英]How to delete certain words from a variable or a list python
common_words = set(['je', 'tek', 'u', 'još', 'a', 'i', 'bi',
's', 'sa', 'za', 'o', 'kojeg', 'koju', 'kojom', 'kojoj',
'kojega', 'kojemu', 'će', 'što', 'li', 'da', 'od', 'do',
'su', 'ali', 'nego', 'već', 'no', 'pri', 'se', 'li',
'ili', 'ako', 'iako', 'bismo', 'koji', 'što', 'da', 'nije',
'te', 'ovo', 'samo', 'ga', 'kako', 'će', 'dobro',
'to', 'sam', 'sve', 'smo', 'kao'])
all = []
for (item_content, item_title, item_url, fetch_date) in cursor:
#text = "{}".format(item_content)
text= item_content
text= re.sub('[,.?";:\-!@#$%^&*()]', '', text)
text = text.lower()
#text = [w for w in text if not w in common_words]
all.append(text)
I want to delete certain words/stopword from either the variable "test", or later from the list "all" I put all the "text" variables from the iteration in. 我想从变量“ test”中删除某些单词/停用词,或者从列表“ all”中删除某些词/停用词,然后将迭代中的所有“ text”变量放入其中。
I tried it like this, but this doesn't delete just words but also those letters if they exist in other words and the output is like 'd','f' for every word, and I want the format to stay the same, I just need those words in the common_words list deleted from the variable (or the list). 我是这样尝试的,但是这不仅会删除单词,还会删除那些字母(如果它们存在于其他单词中),并且每个单词的输出就像“ d”,“ f”一样,并且我希望格式保持不变,我只需要从变量(或列表)中删除的common_words列表中的那些单词。 How would I achieve that?
我将如何实现?
As a pythonic way for remove the punctuation from a test you can use str.translate
method : 作为从测试中删除标点符号的Python方法,您可以使用
str.translate
方法:
>>> "this is224$# a ths".translate(None,punctuation)
'this is224 a ths'
And for replace the words use re.sub
,first create the regex with appending the pip ( |
) to words : 要替换单词
re.sub
,首先要创建正则表达式,并将pip( |
)附加到单词上:
reg='|'.join(common_words)
new_text=re.sub(reg,'',text)
example : 例如:
>>> s="this is224$# a ths"
>>> import re
>>> w=['this','a']
>>> boundary_words=['\b{}\b'.format(i) for i in w]
>>> reg='|'.join(oundary_words)
>>> new_text=re.sub(reg,'',s).translate(None,punctuation)
>>> new_text
' is224 ths'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.