简体   繁体   English

如何 append 只有唯一值到字典中的键?

[英]How to append only unique values to a key in a dictionary?

sorry this is likely a complete noob question, although I'm new to python and am unable to implement any online suggestions such that they actually work.抱歉,这可能是一个完整的菜鸟问题,尽管我是 python 的新手,无法实施任何在线建议,使它们真正起作用。 I need decrease the run-time of the code for larger files, so need to reduce the number of iterations i'm doing.我需要减少较大文件的代码运行时间,因此需要减少我正在执行的迭代次数。

How do I modify the append_value function below to append only UNIQUE values to dict_obj, and remove the need for another series of iterations to do this later on.我如何将下面的 append_value function 修改为 append 只有 UNIQUE 值到 dict_obj,并消除对另一系列迭代的需要,以便稍后执行此操作。

EDIT: Sorry, here is an example input/output编辑:抱歉,这是一个示例输入/输出

Sample Input:示例输入:

6
5 6
0 1
1 4
5 4
1 2
4 0

Sample Output:样本 Output:

1
4

I'm attempting to solve to solve: http://orac.amt.edu.au/cgi-bin/train/problem.pl?problemid=416我正在尝试解决: http://orac.amt.edu.au/cgi-bin/train/problem.pl?problemid=416

Output Result Output 结果

input_file = open("listin.txt", "r")
output_file = open("listout.txt", "w")

ls = []
n = int(input_file.readline())
for i in range(n): 
    a, b = input_file.readline().split()
    ls.append(int(a))
    ls.append(int(b))

def append_value(dict_obj, key, value):          # How to append only UNIQUE values to
    if key in dict_obj:                          # dict_obj?
        if not isinstance(dict_obj[key], list):
            dict_obj[key] = [dict_obj[key]]
        dict_obj[key].append(value)
    else:
        dict_obj[key] = value

mx = []
ls.sort()
Dict = {}
for i in range(len(ls)):
    c = ls.count(ls[i])
    append_value(Dict, int(c), ls[i])
    mx.append(c)

x = max(mx)
lss = []

list_set = set(Dict[x])                     #To remove the need for this
unique_list = (list(list_set))
for x in unique_list:
    lss.append(x)

lsss = sorted(lss)
for i in lsss:
    output_file.write(str(i) + "\n")
    
output_file.close()
input_file.close()

Thank you谢谢

The answer to your question, 'how to only append unique values to this container' is fairly simple: change it from a list to a set (as @ShadowRanger suggested in the comments).您的问题“如何将 append 个唯一值分配给此容器”的答案非常简单:将其从list更改为set (如@ShadowRanger 在评论中建议的那样)。 This isn't really a question about dictionaries, though;不过,这实际上不是关于字典的问题。 you're not appending values to 'dict_obj', only to a list stored in the dictionary.您不是将值附加到“dict_obj”,而是附加到存储在字典中的列表。

Since the source you linked to shows this is a training problem for people newer to coding, you should know that changing the lists to sets might be a good idea, but it's not the cause of the performance issues.由于您链接到的来源显示这是一个针对编码新手的培训问题,您应该知道将列表更改为集合可能是个好主意,但这不是性能问题的原因。

The problem boils down to: given a file containing a list of integers, print the most common integer(s).问题归结为:给定一个包含整数列表的文件,打印最常见的整数。 Your current code iterates over the list, and for each index i , iterates over the entire list to count matches with ls[i] (this is the line c = ls.count(ls[i]) ).您当前的代码遍历列表,对于每个索引i ,遍历整个列表以计算与ls[i]的匹配项(这是行c = ls.count(ls[i]) )。

Some operations are more expensive than others : calling count() is one of the more expensive operations on a Python list.有些操作比其他操作更昂贵:调用count()是 Python 列表中更昂贵的操作之一。 It reads through the entire list every time it's called.每次调用时它都会读取整个列表。 This is an O(n) function, which is inside a length n loop, taking O(n^2) time.这是一个O(n) function,它在一个长度为n的循环内,花费O(n^2)时间。 All of the set() filtering for non-unique elements takes O(n) time total (and is even quite fast in practice).对非唯一元素的所有set()过滤总共需要O(n)时间(在实践中甚至相当快)。 Identifying linear-time functions hidden in loops like this is a frequent theme in optimization, but profiling your code would have identified this.像这样识别隐藏在循环中的线性时间函数是优化中的一个常见主题,但是分析您的代码会识别出这一点。

In general, you'll want to use something like the Counter class in Python's standard library for frequency counting.通常,您会希望使用 Python 标准库中的Counter class 之类的东西来进行频率计数。 That kind of defeats the whole point of this training problem, though, which is to encourage you to improve on the brute-force algorithm for finding the most frequent element(s) in a list.不过,这种做法违背了这个训练问题的全部要点,即鼓励您改进用于查找列表中最频繁出现的元素的蛮力算法。 One possible way to solve this problem is to read the description of Counter , and try to mimic its behavior yourself with a plain Python dictionary.解决这个问题的一种可能方法是阅读Counter的描述,并尝试使用普通的 Python 字典自己模仿它的行为。

Answering the question you haven't asked: Your whole approach is overkill.回答你没有问过的问题:你的整个方法是矫枉过正。

  1. You don't need to worry about uniqueness;您无需担心唯一性; the question prompt guarantees that if you see 2 5 , you'll never see 5 2 , nor a repeat of 2 5问题提示保证如果你看到2 5 ,你永远不会看到5 2 ,也不会重复2 5
  2. You don't even care who is friends with who, you just care how many friends an individual has你甚至不关心谁和谁是朋友,你只关心一个人有多少朋友

So don't even bother making the pairs.所以甚至不用费心去配对。 Just count how many times each player ID appears at all .只需计算每个玩家 ID 出现次数即可。 If you see 2 5 , that means 2 has one more friend, and 5 has one more friend, it doesn't matter who they are friends with .如果你看到2 5 ,那就意味着2多了一个朋友,而5多了一个朋友,不管他们谁是朋友。

The entire problem can simplify down to a simple exercise in separating the player IDs and counting them all up (because each appearance means one more unique friend), then keeping only the ones with the highest counts.整个问题可以简化为一个简单的练习,即分离玩家 ID 并将它们全部计数(因为每次出现都意味着多了一个独特的朋友),然后只保留计数最高的那些。

A fairly idiomatic solution (reading from stdin and writing to stdout; tweaking it to open files is left as an exercise) would be something like:一个相当惯用的解决方案(从 stdin 读取并写入 stdout;调整它以打开文件留作练习)类似于:

import sys

from collections import Counter
from itertools import chain, islice

def main():
    numlines = int(next(sys.stdin))
    friend_pairs = map(str.split, islice(sys.stdin, numlines)) # Convert lines to friendship pairs
    counts = Counter(chain.from_iterable(friend_pairs))        # Flatten to friend mentions and count mentions to get friend count
    max_count = max(counts.values())                           # Identify maximum friend count
    winners = [pid for pid, cnt in counts.items() if cnt == max_count]
    winners.sort(key=int)                                      # Sort winners numerically
    print(*winners, sep="\n")

if __name__ == '__main__':
    main()

Try it online! 在线试用!

Technically, it doesn't even require the use of islice nor storing to numlines (the line count at the beginning might be useful to low level languages to preallocate an array for results, but for Python, you can just read line by line until you run out), so the first two lines of main could simplify to:从技术上讲,它甚至不需要使用islice也不需要存储到numlines (开头的行数可能对低级语言有用,可以为结果预分配数组,但是对于 Python,您可以逐行读取直到您用完了),所以main的前两行可以简化为:

next(sys.stdin)
friend_pairs = map(str.split, sys.stdin)

But either way, you don't need to uniquify friendships, nor preserve any knowledge of who is friends with whom to figure out who has the most friends, so save yourself some trouble and skip the unnecessary work.但无论哪种方式,您都不需要统一友谊,也不需要保留任何关于谁和谁是朋友的知识来弄清楚谁拥有最多的朋友,所以省去一些麻烦并跳过不必要的工作。

If you intention is to have a list in each value of the dictionary why not iterate the same way you iterated on each key.如果您打算在字典的每个值中都有一个列表,为什么不按照您在每个键上迭代的方式进行迭代。

if key in dict_obj.keys():
    for elem in dict_obje[key]:  # dict_obje[key] asusming the value is a list
        if (elem == value):
        else:
    # append the value to the desired list
else:
    dic_obj[key] = value

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM