简体   繁体   中英

How to insert dictionaries as values into a dictionary using loop on python

I am currently facing a problem to make my cvs data into dictionary.

I have 3 columns that I'd like to use in the file:

userID, placeID, rating
U1000,  12222,   3
U1000,  13333,   2
U1001,  13333,   4

I would like to make the result look like this:

{'U1000': {'12222': 3, '13333': 2}, 
'U1001': {'13333': 4}}

That is to say, I would like to make my data structure look like:

sample = {}
sample["U1000"] = {}
sample["U1001"] = {}
sample["U1000"]["12222"] = 3
sample["U1000"]["13333"] = 2
sample["U1001"]["13333"] = 4

but I have a lot of data to be processed. I'd like to get the result with loop, but i have tried for 2 hours and failed..

---the following codes may confuse you---

My result look like this now:

{'U1000': ['12222', 3],  
'U1001': ['13333', 4]}
  1. the value of the dict is a list rather a dictionary
  2. the user "U1000" appears multiple times but in my result theres only one time

I think my code has many mistakes.. if you don't mind please take a look:

reader = np.array(pd.read_csv("rating_final.csv"))
included_cols = [0, 1, 2]

sample= {}
target=[]
target1 =[]
for row in reader:
        content = list(row[i] for i in included_cols)
        target.append(content[0])
        target1.append(content[1:3])

sample = dict(zip(target, target1))

how can I improve the codes? I have looked through stackoverflow but due to personal lacking in ability, can anyone please kindly help me with this?

Many thanks!!

This should do what you want:

import collections

reader = ...
sample = collections.defaultdict(dict)

for user_id, place_id, rating in reader:
    rating = int(rating)
    sample[user_id][place_id] = rating

print(sample)
# -> {'U1000': {'12222': 3, '1333': 2}, 'U1001': {'13333': 4}}

defaultdict is a convenience utility that provides default values whenever you try to access a key that is not in the dictionary. If you don't like it (for example because you want sample['non-existent-user-id] to fail with KeyError ), use this:

reader = ...
sample = {}

for user_id, place_id, rating in reader:
    rating = int(rating)
    if user_id not in sample:
        sample[user_id] = {}
    sample[user_id][place_id] = rating

The expected output in the example is impossible, since {'1333': 2} would not be associated with a key. You could get {'U1000': {'12222': 3, '1333': 2}, 'U1001': {'13333': 4}} though, with a dict of dict s:

sample = {}
for row in reader:
    userID, placeID, rating = row[:3]
    sample.setdefault(userID, {})[placeID] = rating  # Possibly int(rating)?

Alternatively, using collections.defaultdict(dict) to avoid the need for setdefault (or alternate approaches that involve a try / except KeyError or if userID in sample: that sacrifice the atomicity of setdefault in exchange for not creating empty dict s unnecessarily):

import collections

sample = collections.defaultdict(dict)
for row in reader:
    userID, placeID, rating = row[:3]
    sample[userID][placeID] = rating

# Optional conversion back to plain dict
sample = dict(sample)

The conversion back to plain dict ensures future lookups don't auto-vivify keys, raising KeyError as normal, and it looks like a normal dict if you print it.

If the included_cols is important (because names or column indices might change), you can use operator.itemgetter to speed up and simplify extracting all the desired columns at once:

from collections import defaultdict
from operator import itemgetter

included_cols = (0, 1, 2)
# If columns in data were actually:
# rating, foo, bar, userID, placeID
# we'd do this instead, itemgetter will handle all the rest:
# included_cols = (3, 4, 0)
get_cols = itemgetter(*included_cols)  # Create function to get needed indices at once

sample = defaultdict(dict)
# map(get_cols, ...) efficiently converts each row to a tuple of just 
# the three desired values as it goes, which also lets us unpack directly
# in the for loop, simplifying code even more by naming all variables directly
for userID, placeID, rating in map(get_cols, reader):
    sample[userID][placeID] = rating  # Possibly int(rating)?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM