查找文本中所有重复的单词

Question

I'm trying to find all duplicated words in text, each duplicate contained in a tulpes and save all tuples in a list.我试图在文本中查找所有重复的单词，每个重复的单词都包含在一个元组中，并将所有元组保存在一个列表中。 it needs to colclude cases with punctuation between the words like "so, so"它需要在“so，so”之类的词之间用标点符号来整理案例

I tried to use the pattern:我尝试使用该模式：

/(\b\S+\b)\s+\b\1\b/

but it doesnt return what im looking for, and got trouble with saving the results in the form i need但它不会返回我正在寻找的内容，并且无法以我需要的形式保存结果

example of what im looking for:我正在寻找的示例：

the text = "i went to to a party, party at my uncle's house"

Output at the end of the function: Output 末尾的 function：

[(to ,to), (party, party)]

Answer 1

Regex is for finding specific patterns and not words what you should do is what @thshea said or you can use this code:正则表达式用于查找特定模式而不是单词，您应该做的是 @thshea 所说的，或者您可以使用以下代码：

_answer_ = []
the_text = "i went to to a party, party at my uncle's house"
the_text = the_text.replace(",","")
words = the_text.split(" ")
words2 = list(set(words))
for word in list(words2):
  if word in words:
    words.remove(word)
for word2 in words:
  _answer_ += [tuple([word2,word2])]
_answer_

查找文本中所有重复的单词

问题描述

1 个解决方案

解决方案1
0 2021-06-07 15:20:52

查找文本中所有重复的单词

问题描述

1 个解决方案

解决方案1 0 2021-06-07 15:20:52

解决方案1
0 2021-06-07 15:20:52