使用 python 解決這個問題的最佳方法

Question

我是 python 的新手並練習了一些問題。 無法針對以下問題優化我的解決方案。

問題陳述：根據詞頻對句子中的詞進行編碼，並返回詞的排名和編碼值。

示例：輸入字符串 --> 'aaa bb ccc aaa bbb bb cc ccc ccc bb ccc bbb'

預期 output --> 3|2|1|3|4|2|5|1|1|2|1|4

解釋：- 因為 'aaa' 在原始字符串中出現了 2 次， 'ccc' 出現了 4 次， 'bb' 出現了 3 次，因此它們根據頻率獲得排名。 以這種方式，'ccc' 等級為 1，'bb' 等級為 2，'ccc' 等級為 3。因此結果如上所述。

下面是我的 python 代碼，但無法優化。 有人可以幫忙嗎。

def testing(s):
    ht = {}
    new_strs = strs.split()
    print(new_strs)
    for i in new_strs:
        if i in ht:
            ht[i] += 1
        else:
            ht[i] = 1
    print(ht)
    
    temp = list(map(list, sorted(ht.items(), key=lambda v: v[1], reverse=True)))
    print(temp)

    for k,v in enumerate(temp):
        temp[k].append(k+1)
    print(temp)
    
    final = []
    for j in new_strs:
        for t in temp:
            if t[0] == j:
                final.append(str(t[2]))
    return '|'.join(final)

strs = 'aaa bb ccc aaa bbb bb cc ccc ccc bb ccc bbb'
result = testing(str)
print(result)

下面是我從這段代碼中得到的結果。

['aaa', 'bb', 'ccc', 'aaa', 'bbb', 'bb', 'cc', 'ccc', 'ccc', 'bb', 'ccc', 'bbb']

{'aaa': 2, 'bb': 3, 'ccc': 4, 'bbb': 2, 'cc': 1}

[['ccc', 4], ['bb', 3], ['aaa', 2], ['bbb', 2], ['cc', 1]]

[['ccc', 4, 1], ['bb', 3, 2], ['aaa', 2, 3], ['bbb', 2, 4], ['cc', 1, 5]]

3|2|1|3|4|2|5|1|1|2|1|4

預先感謝您的幫助。

Answer 1

通過計數，您的代碼很好。 從您的for j循環開始，我完全不確定您認為這應該如何工作。

您需要遍歷字符串中的給定單詞——一個循環，而不是嵌套循環。 對於輸入中的每個單詞，將其頻率放入結果中。

for word in new_strs:
    final.append(str(ht[word]))
print(final)

通過該替換，您的 output 是：

['2', '3', '4', '2', '2', '3', '1', '4', '4', '3', '4', '2']
2|3|4|2|2|3|1|4|4|3|4|2

正如Robert已經指出的那樣，您的代碼中還有其他錯誤。 特別是，您將類型傳遞給 function。 如果您打算str成為一個變量，請不要這樣做。 當您使用 Python 定義的名稱（類型字符串）作為變量時，會損壞名稱空間，並且會發生奇怪的事情。

Answer 2

這有點令人費解，但會這樣做。

我認為這是 go 的最佳方法，即將排名邏輯分成 class。

from collections import Counter


class Ranker:
    def __init__(self, items):
        self._item_counts = Counter(items)
        self._ranks = list(set(i[1] for i in Counter(items).most_common()))[::-1]

    def __getitem__(self, item):
        return self._ranks.index(self._item_counts[item]) + 1


if __name__ == '__main__':
    strs = 'aaa bb ccc aaa bbb bb cc ccc ccc bb ccc bbb aaa'.split()
    r = Ranker(strs)
    print('|'.join([str(r[s]) for s in strs]))
    # 2|2|1|2|3|2|4|1|1|2|1|3|2

Answer 3

正如評論中指出的那樣，而不是

strs = '...'  # This is a global variable

def testing(s):
    ... # Body of testing function that never references the local `s` variable

你應該有

def testing(strs):
    ... # Body of testing uses `strs` as before

沒有理由對ht.values()進行排序，因此可以完全取消分配給temp的操作。

當您遍歷new_strs時，您要做的就是創建一個列表，其中包含 new_strs 中元素的計數。 這是您存儲在ht字典中的內容。 所以

for s in new_strs:
    final.append(ht[s])

現在 final 是一個列表，其中包含字符串在原始字符串中出現的次數。 您可以像現在一樣返回。

我建議進行這些小改動並查看它是否有效。 然后，一旦 function 按您的預期工作，就有很多可以清理的地方。

您可以使用defaultdict而不是常規字典。 您可以使用列表推導來構建final列表。

from collections import defaultdict

def testing(strs):
    ht = defaultdict(int)
    new_strs = strs.split()
    for s in new_strs:
         ht[s] += 1  # if `s` is not in ht, the default 0 is used.
    final = [strs(ht[s]) for s in new_strs]
    return '|'.join(final)

字符串連接方法可以使用生成器，因此不需要創建中間final變量。 最后兩行可以寫成一行

return '|'.join(strs(ht[s]) for s in new_strs)

collections 模塊有一個計數器集合，它可以准確計算列表中的內容。 您可以將此 function 編寫為：

from collections import Counter

def testing(strs):
    new_strs = strs.split()
    ht = Counter(new_strs)
    return '|'.join(str(ht[s]) for s in new_strs)

Answer 4

自從最初提出這個問題以來，這個問題已經發生了變化。 所以這是一個新的答案。

def testing(strs):
    new_strs = strs.split()
    ht = Counter(new_strs)
    ranks = rank(sorted(list(dict(ht).items()), key = lambda t: t[1], reverse=True))
    ranks_dict = dict(ranks)
    return '|'.join(str(ranks_dict[s]) for s in new_strs

您只需要rank function，它采用 (value, score) 的元組的排序列表並返回 (value, rank) 的列表

def rank(tuples):
    current_score = tuples[0][1]
    current_rank = 1
    ties = 0
    ranks = []
    for tup in tuples:
        if tup[1] == current_score:
            ties += 1
        else:
            current_rank = current_rank + ties
            ties = 1
        ranks.append((tup[0], current_rank))
        current_score = tup[1]
    return ranks

請注意，我正在計算兩個出現相同次數的單詞，因為它們具有相同的排名。 在您的示例中，您將它們設置為不同的等級，但沒有提供確定哪個是哪個的方法。 我希望這足以讓你走上正軌。

使用 python 解決這個問題的最佳方法

問題描述

4 個解決方案

解決方案1
1 2021-04-20 00:16:19

解決方案2
0 2021-04-20 00:23:30

解決方案3
0 2021-04-20 00:35:15

解決方案4
0 2021-04-20 01:41:53

使用 python 解決這個問題的最佳方法

問題描述

4 個解決方案

解決方案1 1 2021-04-20 00:16:19

解決方案2 0 2021-04-20 00:23:30

解決方案3 0 2021-04-20 00:35:15

解決方案4 0 2021-04-20 01:41:53

解決方案1
1 2021-04-20 00:16:19

解決方案2
0 2021-04-20 00:23:30

解決方案3
0 2021-04-20 00:35:15

解決方案4
0 2021-04-20 01:41:53