I have a spreadsheet that has data like this
Group,Region,Market
G7,EMEA,Germany
G7,NA,Canada
G7,APAC,Japan
What is the most efficient way to capture this information? I use a dictionary to store this information as {Group: {Region : Market} }
and the code I have is
try:
with open(fileName) as sourceFile:
for line in sourceFile:
if not headerRow:
for group, region, market in [line.rstrip().split(",")]:
if group in self.REGIONAL_MARKETS:
self.REGIONAL_MARKETS[group].update({int(region):market})
else:
self.REGIONAL_MARKETS.update({group:{int(region):market}})
headerRow=False
return self.REGIONAL_MARKETS
except IOError as e:
print("Invalid File Name. Message = "%(e))
Thanks for your inputs
Two things:
try
block is too big (the shorter the better, as it means more specific error handling); and collections.defaultdict
to simplify the creation of your output data structure. Try something like:
from collections import defaultdict
data = defaultdict(dict)
try:
with open(fileName) as sourceFile:
header = sourceFile.readline() # skip header
lines = sourceFile.readlines() # get the rest of the data
except IOError as e:
print("Invalid File Name. Message = "%(e))
else:
for line in lines:
group, region, market = line.rstrip().split(",") # don't iterate over a
# single-element list
data[group].update({region: market}) # how is e.g. 'EMEA' an integer?
On your test data, this gives me:
>>> data
defaultdict(<type 'dict'>, {'G7': {'NA': 'Canada',
'EMEA': 'Germany',
'APAC': 'Japan'}})
Additionally, look into csv.DictReader
, which will do some of the file processing work for you.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.