How to subdivide a nested list (used as a matrix) into lists based on the string in the 0th column

Question

trees=[
['species_1', observednumber_1, calculatedvalue, calculatedvalue],
['species_2', observednumber_2, calculatedvalue, calculatedvalue],
['species_1', observednumber_3, calculatedvalue, calculatedvalue],
[etc.]
]

This is data from a sample site. Each row is an observation. The number of observations, the number of species involved, and the number of each species varies - ie there may be several individuals of each species. (I've used species_1 etc as a standin for the alphameric code for a species - there are several hundred species involved, only a few in each site - I'd like to be able to enter the code directly). The number of (observations) rows might be about 20-30 and the number of species 4-8

I need to be able to sum the calculated values for EACH of the species

The only way I see to do this is to subdivide the list into lists for each species. How can I do that? Once I've done that I can take column totals.

Answer 1

You can use a defaultdict to 'group' rows by a key:

from collections import defaultdict

grouped = defaultdict(list)

for row in trees:
    grouped[row[0]].append(row)

Now grouped is a dictionary with the first column as key, and the values are lists of rows that all have the same first column.

You could do the summing in-place:

from collections import defaultdict

grouped = defaultdict(int)

for row in trees:
    grouped[row[0]] += row[1] * row[2]

where row[1] * row[2] can be any expression. Now grouped maps species named in the first column to the sum calculated for that species.

Answer 2

You can use http://docs.python.org/2/library/itertools.html#itertools.groupby

import itertools as it, operator as op

# some dummy data so the example runs
observednumber_1 = 1
observednumber_2 = 2
observednumber_3 = 3
calculatedvalue = None

trees=[
  ['species_1', observednumber_1, calculatedvalue, calculatedvalue],
  ['species_2', observednumber_2, calculatedvalue, calculatedvalue],
  ['species_1', observednumber_3, calculatedvalue, calculatedvalue], ]

for k,g in it.groupby(sorted(trees,key=op.itemgetter(0)),key=op.itemgetter(0)):
  print k,sum(i[1] for i in g)

Result:

species_1 4
species_2 2

Notes:

Input to itertools.groupby must be sorted by the column(s) upon which you will be grouping.
Variables k and g stand in for "key" and "group", respectively.
Note that g is a generator and if you wish to re-use it you may need to temporarily store it in a list or other data-structure.

Edit: I have added an example of how to use another data-structure to store the results of the generator for further calculations.

for k,g in it.groupby(sorted(trees,key=op.itemgetter(0)),key=op.itemgetter(0)):
  tempg = list(g)
  print k, sum(i[1] for i in tempg), sum(i[2] for i in tempg)

How to subdivide a nested list (used as a matrix) into lists based on the string in the 0th column

Question

2 answers

solution1
2 2013-05-01 15:24:07

solution2
0 ACCPTED 2013-05-01 15:27:26

How to subdivide a nested list (used as a matrix) into lists based on the string in the 0th column

Question

2 answers

solution1 2 2013-05-01 15:24:07

solution2 0 ACCPTED 2013-05-01 15:27:26

solution1
2 2013-05-01 15:24:07

solution2
0 ACCPTED 2013-05-01 15:27:26