[英]Find similar words in strings in a for loop with python
I'm working with tweets and after text processing , the code returns something like: 我正在处理推文,并且经过文本处理后,代码返回如下内容:
So sqlite database identify these records as unique. 因此,sqlite数据库将这些记录标识为唯一。 My question is how can I find if two strings contains 5 similar words then skip it?
我的问题是如何找到两个字符串是否包含5个相似的单词然后跳过呢? Should I change my regex code or add
if statement
? 我应该更改我的正则表达式代码还是添加
if statement
?
My code: 我的代码:
clean1 = re.sub(r"(?:@\S*|#\S*|http(?=.*://)\S*)", "", tweet.text)
clean2 = re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t:])|(\w+:\/\/\S+)", " ", clean1)
final = re.sub(r'^RT[\s]+', '', clean2)
Thanks! 谢谢!
I don't think regex will help in this situation 我认为正则表达式在这种情况下不会有所帮助
You could do this to tell if two lines have 5 same words 您可以这样做来判断两行是否有5个相同的单词
str1 = "Lorem ipsum dolor sit amaet vi"
str2 = "Lorem ipsum dolor sit amaet"
count = 0
str1_split = str1.split(" ")
for word in str2.split(" "):
if word in str1_split:
count += 1
print count
Here is the method to count same words in two string: 这是对两个字符串中的相同单词进行计数的方法:
a="Lorem ipsum dolor sit amaet vi"
b="Lorem ipsum dolor sit amaet"
count=0
for i,j in zip(a.split(),b.split()):
if i==j:
count+=1
print count
Output: 输出:
5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.