简体   繁体   English

使用 python 解决这个问题的最佳方法

[英]Best approach to this question using python

I am new to python and practicing some problems.我是 python 的新手并练习了一些问题。 Unable to optimize my solution for below problem.无法针对以下问题优化我的解决方案。

problem statement: Encode words in sentence based on word frequency and return their rank and encoded value for the word.问题陈述:根据词频对句子中的词进行编码,并返回词的排名和编码值。

example: input string --> 'aaa bb ccc aaa bbb bb cc ccc ccc bb ccc bbb'示例:输入字符串 --> 'aaa bb ccc aaa bbb bb cc ccc ccc bb ccc bbb'

expected output --> 3|2|1|3|4|2|5|1|1|2|1|4预期 output --> 3|2|1|3|4|2|5|1|1|2|1|4

Explanation:- because 'aaa' came 2 times in the original string, and 'ccc' 4 times and 'bb' 3 times, hence they get ranking based on frequency.解释:- 因为 'aaa' 在原始字符串中出现了 2 次, 'ccc' 出现了 4 次, 'bb' 出现了 3 次,因此它们根据频率获得排名。 In that manner 'ccc' rank is 1, 'bb' rank is 2, 'ccc' rank is 3. Hence the result as mentioned above.以这种方式,'ccc' 等级为 1,'bb' 等级为 2,'ccc' 等级为 3。因此结果如上所述。

Below is my python code, but unable to optimize it.下面是我的 python 代码,但无法优化。 Can someone please help.有人可以帮忙吗。

def testing(s):
    ht = {}
    new_strs = strs.split()
    print(new_strs)
    for i in new_strs:
        if i in ht:
            ht[i] += 1
        else:
            ht[i] = 1
    print(ht)
    
    temp = list(map(list, sorted(ht.items(), key=lambda v: v[1], reverse=True)))
    print(temp)

    for k,v in enumerate(temp):
        temp[k].append(k+1)
    print(temp)
    
    final = []
    for j in new_strs:
        for t in temp:
            if t[0] == j:
                final.append(str(t[2]))
    return '|'.join(final)

strs = 'aaa bb ccc aaa bbb bb cc ccc ccc bb ccc bbb'
result = testing(str)
print(result)

Below is the result i am getting from this code.下面是我从这段代码中得到的结果。

['aaa', 'bb', 'ccc', 'aaa', 'bbb', 'bb', 'cc', 'ccc', 'ccc', 'bb', 'ccc', 'bbb']

{'aaa': 2, 'bb': 3, 'ccc': 4, 'bbb': 2, 'cc': 1}

[['ccc', 4], ['bb', 3], ['aaa', 2], ['bbb', 2], ['cc', 1]]

[['ccc', 4, 1], ['bb', 3, 2], ['aaa', 2, 3], ['bbb', 2, 4], ['cc', 1, 5]]

3|2|1|3|4|2|5|1|1|2|1|4

Thank you in advance for your help.预先感谢您的帮助。

Your code is fine through the counting.通过计数,您的代码很好。 Starting with your for j loop, I'm not at all sure how you think this is supposed to work.从您的for j循环开始,我完全不确定您认为这应该如何工作。

You need to iterate through the given words in the string -- one loop, not nested loops.您需要遍历字符串中的给定单词——一个循环,而不是嵌套循环。 For each word in the input, place its frequency into the result.对于输入中的每个单词,将其频率放入结果中。

for word in new_strs:
    final.append(str(ht[word]))
print(final)

With that replacement, your output is:通过该替换,您的 output 是:

['2', '3', '4', '2', '2', '3', '1', '4', '4', '3', '4', '2']
2|3|4|2|2|3|1|4|4|3|4|2

As Robert already pointed out, you have other errors in your code.正如Robert已经指出的那样,您的代码中还有其他错误。 In particular, you passed a type into your function.特别是,您将类型传递给 function。 If you intended str to be a variable, don't do that .如果您打算str成为一个变量,请不要这样做 When you use a Python defined name (type string) as a variable, you damage your name space, and strange things happen.当您使用 Python 定义的名称(类型字符串)作为变量时,会损坏名称空间,并且会发生奇怪的事情。

This is a little convoluted but would do it.这有点令人费解,但会这样做。

I think this is the best way to go ie separate the ranking logic into a class.我认为这是 go 的最佳方法,即将排名逻辑分成 class。

from collections import Counter


class Ranker:
    def __init__(self, items):
        self._item_counts = Counter(items)
        self._ranks = list(set(i[1] for i in Counter(items).most_common()))[::-1]

    def __getitem__(self, item):
        return self._ranks.index(self._item_counts[item]) + 1


if __name__ == '__main__':
    strs = 'aaa bb ccc aaa bbb bb cc ccc ccc bb ccc bbb aaa'.split()
    r = Ranker(strs)
    print('|'.join([str(r[s]) for s in strs]))
    # 2|2|1|2|3|2|4|1|1|2|1|3|2


As pointed out in a comment, instead of正如评论中指出的那样,而不是

strs = '...'  # This is a global variable

def testing(s):
    ... # Body of testing function that never references the local `s` variable

You should have你应该有

def testing(strs):
    ... # Body of testing uses `strs` as before

There's no reason to sort ht.values() , so assigning to temp can be taken out altogether.没有理由对ht.values()进行排序,因此可以完全取消分配给temp的操作。

As you loop through new_strs all you want to be doing is making a list that contains the count of the element in new_strs.当您遍历new_strs时,您要做的就是创建一个列表,其中包含 new_strs 中元素的计数。 This is what you stored in the ht dictionary.这是您存储在ht字典中的内容。 So所以

for s in new_strs:
    final.append(ht[s])

Now final is a list that contains the count of how many times the strings appeared in the original string.现在 final 是一个列表,其中包含字符串在原始字符串中出现的次数。 And you can return the same as you currently do.您可以像现在一样返回。

I recommend making those little changes and seeing that it works.我建议进行这些小改动并查看它是否有效。 Then, once the function works as you intended, there is a lot that can be cleaned up.然后,一旦 function 按您的预期工作,就有很多可以清理的地方。

You can use a defaultdict instead of a regular dictionary.您可以使用defaultdict而不是常规字典。 And you can use a list comprehension to construct the final list.您可以使用列表推导来构建final列表。

from collections import defaultdict

def testing(strs):
    ht = defaultdict(int)
    new_strs = strs.split()
    for s in new_strs:
         ht[s] += 1  # if `s` is not in ht, the default 0 is used.
    final = [strs(ht[s]) for s in new_strs]
    return '|'.join(final)

The string join method can take a generator, so it's not necessary to create the intermediate final variable.字符串连接方法可以使用生成器,因此不需要创建中间final变量。 The last two lines can be written as one最后两行可以写成一行

return '|'.join(strs(ht[s]) for s in new_strs)

The collections module has a Counter collection That does exactly counting things in a list. collections 模块有一个计数器集合,它可以准确计算列表中的内容。 You can write this function as:您可以将此 function 编写为:

from collections import Counter

def testing(strs):
    new_strs = strs.split()
    ht = Counter(new_strs)
    return '|'.join(str(ht[s]) for s in new_strs)

This question has changed since it was originally asked.自从最初提出这个问题以来,这个问题已经发生了变化。 So here is a new answer.所以这是一个新的答案。

def testing(strs):
    new_strs = strs.split()
    ht = Counter(new_strs)
    ranks = rank(sorted(list(dict(ht).items()), key = lambda t: t[1], reverse=True))
    ranks_dict = dict(ranks)
    return '|'.join(str(ranks_dict[s]) for s in new_strs

You just need the rank function, which takes a sorted list of tuples of (value, score) and returns a list of (value, rank)您只需要rank function,它采用 (value, score) 的元组的排序列表并返回 (value, rank) 的列表

def rank(tuples):
    current_score = tuples[0][1]
    current_rank = 1
    ties = 0
    ranks = []
    for tup in tuples:
        if tup[1] == current_score:
            ties += 1
        else:
            current_rank = current_rank + ties
            ties = 1
        ranks.append((tup[0], current_rank))
        current_score = tup[1]
    return ranks

Note that I am counting two words that appear the same number of times as having the same rank.请注意,我正在计算两个出现相同次数的单词,因为它们具有相同的排名。 In your example you had them as different rank, but didn't provide a way to determine which was which.在您的示例中,您将它们设置为不同的等级,但没有提供确定哪个是哪个的方法。 I hope this is enough to get you on track.我希望这足以让你走上正轨。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM