简体   繁体   中英

Best and most efficient way to capture spreadsheet data

I have a spreadsheet that has data like this

Group,Region,Market
G7,EMEA,Germany
G7,NA,Canada
G7,APAC,Japan

What is the most efficient way to capture this information? I use a dictionary to store this information as {Group: {Region : Market} }

and the code I have is

    try:
        with open(fileName) as sourceFile:
            for line in sourceFile:
                if not headerRow:
                    for group, region, market in [line.rstrip().split(",")]:
                        if group in self.REGIONAL_MARKETS:
                            self.REGIONAL_MARKETS[group].update({int(region):market})
                        else:
                            self.REGIONAL_MARKETS.update({group:{int(region):market}})
                headerRow=False
            return self.REGIONAL_MARKETS
    except IOError as e:
        print("Invalid File Name. Message = "%(e))

Thanks for your inputs

Two things:

  1. Your try block is too big (the shorter the better, as it means more specific error handling); and
  2. You could use collections.defaultdict to simplify the creation of your output data structure.

Try something like:

from collections import defaultdict

data = defaultdict(dict)

try:
    with open(fileName) as sourceFile:
        header = sourceFile.readline() # skip header
        lines = sourceFile.readlines() # get the rest of the data
except IOError as e:
    print("Invalid File Name. Message = "%(e))
else:
    for line in lines:
        group, region, market = line.rstrip().split(",") # don't iterate over a
                                                         # single-element list
        data[group].update({region: market}) # how is e.g. 'EMEA' an integer?

On your test data, this gives me:

>>> data
defaultdict(<type 'dict'>, {'G7': {'NA': 'Canada', 
                                   'EMEA': 'Germany', 
                                   'APAC': 'Japan'}})         

Additionally, look into csv.DictReader , which will do some of the file processing work for you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM