I am currently facing a problem to make my cvs data into dictionary.
I have 3 columns that I'd like to use in the file:
userID, placeID, rating
U1000, 12222, 3
U1000, 13333, 2
U1001, 13333, 4
I would like to make the result look like this:
{'U1000': {'12222': 3, '13333': 2},
'U1001': {'13333': 4}}
That is to say, I would like to make my data structure look like:
sample = {}
sample["U1000"] = {}
sample["U1001"] = {}
sample["U1000"]["12222"] = 3
sample["U1000"]["13333"] = 2
sample["U1001"]["13333"] = 4
but I have a lot of data to be processed. I'd like to get the result with loop, but i have tried for 2 hours and failed..
---the following codes may confuse you---
My result look like this now:
{'U1000': ['12222', 3],
'U1001': ['13333', 4]}
I think my code has many mistakes.. if you don't mind please take a look:
reader = np.array(pd.read_csv("rating_final.csv"))
included_cols = [0, 1, 2]
sample= {}
target=[]
target1 =[]
for row in reader:
content = list(row[i] for i in included_cols)
target.append(content[0])
target1.append(content[1:3])
sample = dict(zip(target, target1))
how can I improve the codes? I have looked through stackoverflow but due to personal lacking in ability, can anyone please kindly help me with this?
Many thanks!!
This should do what you want:
import collections
reader = ...
sample = collections.defaultdict(dict)
for user_id, place_id, rating in reader:
rating = int(rating)
sample[user_id][place_id] = rating
print(sample)
# -> {'U1000': {'12222': 3, '1333': 2}, 'U1001': {'13333': 4}}
defaultdict
is a convenience utility that provides default values whenever you try to access a key that is not in the dictionary. If you don't like it (for example because you want sample['non-existent-user-id]
to fail with KeyError
), use this:
reader = ...
sample = {}
for user_id, place_id, rating in reader:
rating = int(rating)
if user_id not in sample:
sample[user_id] = {}
sample[user_id][place_id] = rating
The expected output in the example is impossible, since {'1333': 2}
would not be associated with a key. You could get {'U1000': {'12222': 3, '1333': 2}, 'U1001': {'13333': 4}}
though, with a dict
of dict
s:
sample = {}
for row in reader:
userID, placeID, rating = row[:3]
sample.setdefault(userID, {})[placeID] = rating # Possibly int(rating)?
Alternatively, using collections.defaultdict(dict)
to avoid the need for setdefault
(or alternate approaches that involve a try
/ except KeyError
or if userID in sample:
that sacrifice the atomicity of setdefault
in exchange for not creating empty dict
s unnecessarily):
import collections
sample = collections.defaultdict(dict)
for row in reader:
userID, placeID, rating = row[:3]
sample[userID][placeID] = rating
# Optional conversion back to plain dict
sample = dict(sample)
The conversion back to plain dict
ensures future lookups don't auto-vivify keys, raising KeyError
as normal, and it looks like a normal dict
if you print
it.
If the included_cols
is important (because names or column indices might change), you can use operator.itemgetter
to speed up and simplify extracting all the desired columns at once:
from collections import defaultdict
from operator import itemgetter
included_cols = (0, 1, 2)
# If columns in data were actually:
# rating, foo, bar, userID, placeID
# we'd do this instead, itemgetter will handle all the rest:
# included_cols = (3, 4, 0)
get_cols = itemgetter(*included_cols) # Create function to get needed indices at once
sample = defaultdict(dict)
# map(get_cols, ...) efficiently converts each row to a tuple of just
# the three desired values as it goes, which also lets us unpack directly
# in the for loop, simplifying code even more by naming all variables directly
for userID, placeID, rating in map(get_cols, reader):
sample[userID][placeID] = rating # Possibly int(rating)?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.