The task I have at hand is to parse a large text (several 100K rows) file and accumulate some statistics based which will be then visualized in plots. Each row contains results of some prior analysis.
I wrote a custom class to define the objects that are to be accumulated. The class contains 2 string fields, 3 sets and 2 integer counters. As such there is an __init__(self, name)
which initializes a new object with name and empty fields, and a method called addRow()
which adds information into the object. The sets accumulate data to be associated with this object and the counters keep track of a couple of conditions.
My original idea was to iterate over the rows of the file and call a method like parseRow()
in main
reader = csv.reader(f)
acc = {} # or set()
for row in reader:
parseRow(row,acc)
which would look something like:
parseRow(row, acc):
if row[id] is not in acc: # row[id] is the column where the object names/ids are
a = MyObj(row[id])
else:
a = acc.get(row[id]) # or equivalent
a.addRow(...)
The issue here is that the accumulating collection acc
cannot be a set
since sets are apparently not indexable in Python. Edit: for clarification, by indexable I didn't mean getting the nth element but rather being able to retrieve a specific element .
One workaround would be to have a dict
that has {obj_name : obj}
mapping but it feels like an ugly solution. Considering the elegance of the language otherwise, I guess there is a better solution to this. It's surely not a particularly rare situation...
Any suggestions?
You could also try an ordered-set . Which is a set AND ordered.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.