简体   繁体   中英

How to append only unique values to a key in a dictionary?

sorry this is likely a complete noob question, although I'm new to python and am unable to implement any online suggestions such that they actually work. I need decrease the run-time of the code for larger files, so need to reduce the number of iterations i'm doing.

How do I modify the append_value function below to append only UNIQUE values to dict_obj, and remove the need for another series of iterations to do this later on.

EDIT: Sorry, here is an example input/output

Sample Input:

6
5 6
0 1
1 4
5 4
1 2
4 0

Sample Output:

1
4

I'm attempting to solve to solve: http://orac.amt.edu.au/cgi-bin/train/problem.pl?problemid=416

Output Result

input_file = open("listin.txt", "r")
output_file = open("listout.txt", "w")

ls = []
n = int(input_file.readline())
for i in range(n): 
    a, b = input_file.readline().split()
    ls.append(int(a))
    ls.append(int(b))

def append_value(dict_obj, key, value):          # How to append only UNIQUE values to
    if key in dict_obj:                          # dict_obj?
        if not isinstance(dict_obj[key], list):
            dict_obj[key] = [dict_obj[key]]
        dict_obj[key].append(value)
    else:
        dict_obj[key] = value

mx = []
ls.sort()
Dict = {}
for i in range(len(ls)):
    c = ls.count(ls[i])
    append_value(Dict, int(c), ls[i])
    mx.append(c)

x = max(mx)
lss = []

list_set = set(Dict[x])                     #To remove the need for this
unique_list = (list(list_set))
for x in unique_list:
    lss.append(x)

lsss = sorted(lss)
for i in lsss:
    output_file.write(str(i) + "\n")
    
output_file.close()
input_file.close()

Thank you

The answer to your question, 'how to only append unique values to this container' is fairly simple: change it from a list to a set (as @ShadowRanger suggested in the comments). This isn't really a question about dictionaries, though; you're not appending values to 'dict_obj', only to a list stored in the dictionary.

Since the source you linked to shows this is a training problem for people newer to coding, you should know that changing the lists to sets might be a good idea, but it's not the cause of the performance issues.

The problem boils down to: given a file containing a list of integers, print the most common integer(s). Your current code iterates over the list, and for each index i , iterates over the entire list to count matches with ls[i] (this is the line c = ls.count(ls[i]) ).

Some operations are more expensive than others : calling count() is one of the more expensive operations on a Python list. It reads through the entire list every time it's called. This is an O(n) function, which is inside a length n loop, taking O(n^2) time. All of the set() filtering for non-unique elements takes O(n) time total (and is even quite fast in practice). Identifying linear-time functions hidden in loops like this is a frequent theme in optimization, but profiling your code would have identified this.

In general, you'll want to use something like the Counter class in Python's standard library for frequency counting. That kind of defeats the whole point of this training problem, though, which is to encourage you to improve on the brute-force algorithm for finding the most frequent element(s) in a list. One possible way to solve this problem is to read the description of Counter , and try to mimic its behavior yourself with a plain Python dictionary.

Answering the question you haven't asked: Your whole approach is overkill.

  1. You don't need to worry about uniqueness; the question prompt guarantees that if you see 2 5 , you'll never see 5 2 , nor a repeat of 2 5
  2. You don't even care who is friends with who, you just care how many friends an individual has

So don't even bother making the pairs. Just count how many times each player ID appears at all . If you see 2 5 , that means 2 has one more friend, and 5 has one more friend, it doesn't matter who they are friends with .

The entire problem can simplify down to a simple exercise in separating the player IDs and counting them all up (because each appearance means one more unique friend), then keeping only the ones with the highest counts.

A fairly idiomatic solution (reading from stdin and writing to stdout; tweaking it to open files is left as an exercise) would be something like:

import sys

from collections import Counter
from itertools import chain, islice

def main():
    numlines = int(next(sys.stdin))
    friend_pairs = map(str.split, islice(sys.stdin, numlines)) # Convert lines to friendship pairs
    counts = Counter(chain.from_iterable(friend_pairs))        # Flatten to friend mentions and count mentions to get friend count
    max_count = max(counts.values())                           # Identify maximum friend count
    winners = [pid for pid, cnt in counts.items() if cnt == max_count]
    winners.sort(key=int)                                      # Sort winners numerically
    print(*winners, sep="\n")

if __name__ == '__main__':
    main()

Try it online!

Technically, it doesn't even require the use of islice nor storing to numlines (the line count at the beginning might be useful to low level languages to preallocate an array for results, but for Python, you can just read line by line until you run out), so the first two lines of main could simplify to:

next(sys.stdin)
friend_pairs = map(str.split, sys.stdin)

But either way, you don't need to uniquify friendships, nor preserve any knowledge of who is friends with whom to figure out who has the most friends, so save yourself some trouble and skip the unnecessary work.

If you intention is to have a list in each value of the dictionary why not iterate the same way you iterated on each key.

if key in dict_obj.keys():
    for elem in dict_obje[key]:  # dict_obje[key] asusming the value is a list
        if (elem == value):
        else:
    # append the value to the desired list
else:
    dic_obj[key] = value

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM