trees=[
['species_1', observednumber_1, calculatedvalue, calculatedvalue],
['species_2', observednumber_2, calculatedvalue, calculatedvalue],
['species_1', observednumber_3, calculatedvalue, calculatedvalue],
[etc.]
]
This is data from a sample site. Each row is an observation. The number of observations, the number of species involved, and the number of each species varies - ie there may be several individuals of each species. (I've used species_1 etc as a standin for the alphameric code for a species - there are several hundred species involved, only a few in each site - I'd like to be able to enter the code directly). The number of (observations) rows might be about 20-30 and the number of species 4-8
I need to be able to sum the calculated values for EACH of the species
The only way I see to do this is to subdivide the list into lists for each species. How can I do that? Once I've done that I can take column totals.
You can use a defaultdict
to 'group' rows by a key:
from collections import defaultdict
grouped = defaultdict(list)
for row in trees:
grouped[row[0]].append(row)
Now grouped
is a dictionary with the first column as key, and the values are lists of rows that all have the same first column.
You could do the summing in-place:
from collections import defaultdict
grouped = defaultdict(int)
for row in trees:
grouped[row[0]] += row[1] * row[2]
where row[1] * row[2]
can be any expression. Now grouped
maps species named in the first column to the sum calculated for that species.
You can use http://docs.python.org/2/library/itertools.html#itertools.groupby
import itertools as it, operator as op
# some dummy data so the example runs
observednumber_1 = 1
observednumber_2 = 2
observednumber_3 = 3
calculatedvalue = None
trees=[
['species_1', observednumber_1, calculatedvalue, calculatedvalue],
['species_2', observednumber_2, calculatedvalue, calculatedvalue],
['species_1', observednumber_3, calculatedvalue, calculatedvalue], ]
for k,g in it.groupby(sorted(trees,key=op.itemgetter(0)),key=op.itemgetter(0)):
print k,sum(i[1] for i in g)
Result:
species_1 4 species_2 2
Notes:
itertools.groupby
must be sorted by the column(s) upon which you will be grouping. k
and g
stand in for "key" and "group", respectively. g
is a generator and if you wish to re-use it you may need to temporarily store it in a list or other data-structure. Edit: I have added an example of how to use another data-structure to store the results of the generator for further calculations.
for k,g in it.groupby(sorted(trees,key=op.itemgetter(0)),key=op.itemgetter(0)):
tempg = list(g)
print k, sum(i[1] for i in tempg), sum(i[2] for i in tempg)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.