[英]How to find anagrams using 3 words in a text having a list of words?
我有一個文本和6萬個單詞的列表。 我需要使用3個單詞(以字母順序最后一個)從單詞列表中找到文本的字謎,並且該函數應返回3個單詞的元組,以建立給定文本的字謎。 注意:我必須忽略大寫字母和文本中的空格。
我開發了一種功能,該功能可以查找文本中包含的單詞列表中的所有單詞。 但我不知道如何結束字謎。
def es3(words_list, text):
text=text.replace(" ","")
for x in text:
text=text.replace(x,x.lower())
result=[]
cont=0
for x in words_list:
if len(x)>=2:
for c in x:
if c in text:
cont+=1
if cont==len(x):
result.append(x)
cont=0
Examples:
text = "Andrea Sterbini" -> anagram= ('treni', 'sia', 'brande')
sorted(andreasterbini)==sorted(treni+sia+brande)
"Angelo Monti" -> ('toni', 'nego', 'mal')
"Angelo Spognardi" -> ('sragion', 'pend', 'lago')
"Ha da veni Baffone" -> ('video', 'beh', 'affanna')
天真的算法是:
(sorted letters of the three words, triplet)
到多圖(每個鍵可以接受多個值的映射:在Python中,規則映射key -> [values]
)。 問題在於多圖的構造具有O(N^3)
時間和空間復雜度。 如果N = 60,000,則您有21.6萬億個運算和值。 好多啊!
讓我們嘗試減少這種情況。 讓我重新討論這個問題:給定一個序列,找到三個子序列:1.不重疊並且覆蓋該序列; 2.在給定的集合中。 參見第一個示例:“ Angelo Monti”->(“ toni”,“ nego”,“ mal”)
sequence a e g i l m n n o o t
subseq1 i n o t
subseq2 e g n o
subseq3 a l m
找到三個覆蓋該序列的非重疊子序列與將k個組中的n個元素進行分區的問題相同。 復雜度稱為S(n,k) ,以1/2 (nk) k^(nk)
為界。 因此,找到k個組中n個元素的所有分區具有O(n^k * k^(nk))
復雜度。
讓我們嘗試在Python中實現這一點:
def partitions(S, k):
if len(S) < k: # can't partition if there are not enough elements
raise ValueError()
elif k == 1:
yield tuple([S]) # one group: return the set
elif len(S) == k:
yield tuple(map(list, S)) # ([e1], ..., [e[n]])
else:
e, *M = S # extract the first element
for p in partitions(M, k-1): # we need k-1 groups because...
yield ([e], *p) # the first element is a group on itself
for p in partitions(M, k):
for i in range(len(p)): # add the first element to every group
yield tuple(list(p[:i]) + [[e] + p[i]] + list(p[i+1:]))
一個簡單的測試:
>>> list(partitions("abcd", 3))
[(['a'], ['b'], ['c', 'd']), (['a'], ['b', 'c'], ['d']), (['a'], ['c'], ['b', 'd']), (['a', 'b'], ['c'], ['d']), (['b'], ['a', 'c'], ['d']), (['b'], ['c'], ['a', 'd'])]
現在,我將使用您在問題中使用的一些單詞作為單詞列表:
words = "i have a text and a list of words i need to find anagrams of the text from the list of words using words lasts in alphabetic order and the function should return a tuple of the words that build an anagram of the given text note i have to ignore capital letters and spaces that are in the text i have developed the function that finds all the words of the list of words that are contained in the text but i dont know how to end finding the anagrams and some examples treni sia brande toni nego mal sragion pend lago video beh affanna".split(" ")
並建立一個dict sorted(letters) -> list of words
以檢查組
word_by_sorted = {}
for w in words:
word_by_sorted.setdefault("".join(sorted(w)), set()).add(w)
結果是:
>>> word_by_sorted
{'i': {'i'}, 'aehv': {'have'}, 'a': {'a'}, 'ettx': {'text'}, 'adn': {'and'}, 'ilst': {'list'}, 'fo': {'of'}, 'dorsw': {'words'}, 'deen': {'need'}, 'ot': {'to'}, 'dfin': {'find'}, 'aaagmnrs': {'anagrams'}, 'eht': {'the'}, 'fmor': {'from'}, 'ginsu': {'using'}, 'alsst': {'lasts'}, 'in': {'in'}, 'aabcehilpt': {'alphabetic'}, 'deorr': {'order'}, 'cfinnotu': {'function'}, 'dhlosu': {'should'}, 'enrrtu': {'return'}, 'elptu': {'tuple'}, 'ahtt': {'that'}, 'bdilu': {'build'}, 'an': {'an'}, 'aaagmnr': {'anagram'}, 'eginv': {'given'}, 'enot': {'note'}, 'eginor': {'ignore'}, 'aacilpt': {'capital'}, 'eelrstt': {'letters'}, 'acepss': {'spaces'}, 'aer': {'are'}, 'ddeeelopv': {'developed'}, 'dfins': {'finds'}, 'all': {'all'}, 'acdeinnot': {'contained'}, 'btu': {'but'}, 'dnot': {'dont'}, 'know': {'know'}, 'how': {'how'}, 'den': {'end'}, 'dfgiinn': {'finding'}, 'emos': {'some'}, 'aeelmpsx': {'examples'}, 'einrt': {'treni'}, 'ais': {'sia'}, 'abdenr': {'brande'}, 'inot': {'toni'}, 'egno': {'nego'}, 'alm': {'mal'}, 'aginors': {'sragion'}, 'denp': {'pend'}, 'aglo': {'lago'}, 'deiov': {'video'}, 'beh': {'beh'}, 'aaaffnn': {'affanna'}}
現在,將這些塊放在一起:如果三組是列表中單詞的字謎,請檢查三組text
每個分區並輸出單詞:
for p in partitions("angelomonti", 3):
L = [word_by_sorted.get("".join(sorted(xs)), set()) for xs in p]
for anagrams in itertools.product(*L):
print (anagrams)
備注:
word_by_sorted.get("".join(sorted(xs)), set())
在dict中將已排序的字母組作為字符串進行搜索,並返回一組單詞或一個空集(如果不匹配)。 itertools.product(*L)
創建找到的集合的笛卡爾積。 如果有一個空集(沒有匹配項的組),那么根據定義,該產品為空。 Ouput(有重復的原因,請嘗試查找!):
('nego', 'mal', 'toni')
('mal', 'nego', 'toni')
('mal', 'nego', 'toni')
('mal', 'nego', 'toni')
在這里重要的是,單詞的數量不再是問題(字典中的查找將攤銷O(1)
),但是要搜索的文本的長度可能變為1。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.