簡體   English   中英

創建由單詞對組成的元組

[英]Create tuples consisting of pairs of words

我有一個字符串(或單詞列表)。 我想為每個可能的單詞對組合創建元組,以便將它們傳遞給Counter以進行字典創建和頻率計算。 頻率以以下方式計算:如果該對以字符串形式存在(無論順序如何,或者它們之間是否有其他單詞),則頻率= 1(即使單詞1的頻率為7,單詞2的頻率為3,對word1和word2仍然為1)

我正在使用循環創建所有對的元組但被卡住了

tweetList = ('I went to work but got delayed at other work and got stuck in a traffic and I went to drink some coffee but got no money and asked for money from work', 'We went to get our car but the car was not ready. We tried to expedite our car but were told it is not ready')

words = set(tweetList.split())
n = 10
for tweet in tweetList:

    for word1 in words:
        for word2 in words:
            pairW = [(word1, word2)]

            c1 = Counter(pairW for pairW in tweet)

c1.most_common(n)

但是,輸出結果非常奇怪:

[('k', 1)]

似乎是單詞而不是單詞,它遍歷字母

如何解決? 使用split()將字符串轉換為單詞列表?

另一個問題:如何避免創建重復的元組,例如:(word1,word2)和(word2,word1)? 枚舉?

作為輸出,我期望有一個字典,其中的鍵=所有單詞對(盡管請參閱重復的注釋),而值=列表中一對單詞的出現頻率

謝謝!

我想知道這是否是您想要的:

import itertools, collections

tweets = ['I went to work but got delayed at other work and got stuck in a traffic and I went to drink some coffee but got no money and asked for money from work',
          'We went to get our car but the car was not ready. We tried to expedite our car but were told it is not ready']

words = set(word.lower() for tweet in tweets for word in tweet.split())
_pairs = list(itertools.permutations(words, 2))
# We need to clean up similar pairs: sort words in each pair and then convert
# them to tuple so we can convert whole list into set.
pairs = set(map(tuple, map(sorted, _pairs)))

c = collections.Counter()

for tweet in tweets:
    for pair in pairs:
        if pair[0] in tweet and pair[1] in tweet:
            c.update({pair: 1})

print c.most_common(10)

結果是: [(('a', 'went'), 2), (('a', 'the'), 2), (('but', 'i'), 2), (('i', 'the'), 2), (('but', 'the'), 2), (('a', 'i'), 2), (('a', 'we'), 2), (('but', 'we'), 2), (('no', 'went'), 2), (('but', 'went'), 2)]

tweet是一個字符串,因此Counter(pairW for pairW in tweet)將計算tweet字母的頻率,這可能不是您想要的。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM