Checking if a value is already in a list

Question

I am producing an average value from the difference of 3 values and want to place it in a list

A sample of the list I want to average looks like this:

...
[6.0, 270.0, -55.845848680633168],
[6.0, 315.0, -47.572000492889323],
[6.5, 0.0, -47.806802767243724],
[6.5, 45.0, -48.511643275159528],
[6.5, 90.0, -45.002053150122123],
[6.5, 135.0, -51.034656702050455],
[6.5, 180.0, -53.266356523649002],
[6.5, 225.0, -47.872632929518339],
[6.5, 270.0, -52.09662072002746],
[6.5, 315.0, -48.563996448937075]]

There will be up to 3 rows where the first 2 columns match (these 2 columns are polar coordinates) and when this is the case I want to take the difference between the 3rd elements, average it and append the polar coordinates of the point and the averaged result into a new list

for a in avg_data:
    comparison = []
    for b in avg_data:
        if a[0] == b[0] and a[1] == b[1]:
            comparison.append(b[2])

    print comparison    
    z = 0   # reset z to 0, z does not need set now in if len(comp) == 1

    if len(comparison) == 2: # if there are only 2 elements, compare them
        z += -(comparison[0]) + comparison[1]
    if len(comparison) == 3: # if all 3 elements are there, compare all 3
        z += -(comparison[0]) + comparison[1]
        z += -(comparison[0]) + comparison[2]
        z += -(comparison[1]) + comparison[2]
        z = z/3 #average the variation

    avg_variation.append([a[0], a[1], z]) #append the polar coordinates and the averaged variation to a list

This code outputs the correct data to the list except it outputs it every time it comes across matching polar coordinates so I end up with duplicate rows.

To stop this I have tried implementing an if statement to look for matching polar coordinates in the avg_variation list before performing the averaging again

if a[0] not in avg_variation and a[1] not in avg_variation:

This does not work and I get the error

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I don't think any or all are what I am looking for as I only want to check the first two columns and not the third column against the already appended values. Anyone any idea how I can make my if statement better?

To clear up a bit more what my actual question is:

My code searches through nested lists for lists where the 1st 2 elements match, performs a calculation on the 3rd elements and then appends them to a new list. My problem is that if there are 2 or 3 rows where the 1st 2 elements match up it appends the result to the new list 2 or 3 times, I want it to only do it once

Edit: Sorry my original question was misleadng as to the purpose of my code.

Answer 1

IIUC, I think a simpler approach would be something like

import numpy as np
from itertools import combinations
from collections import defaultdict

def average_difference(seq):
    return np.mean([j-i for i,j in combinations(seq, 2)]) if len(seq) > 1 else 0

def average_over_xy(seq, fn_to_apply):
    d = defaultdict(list)
    for x,y,z in seq:
        d[x,y].append(z)

    outlist = [[x,y,fn_to_apply(z)] for (x,y),z in sorted(d.items())]
    return outlist

which loops over all the rows, makes a dictionary where the x,y coordinates are the keys and the values lists of elements, and then turns that dictionary into a sorted list of lists, applying the specified function among the elements in z . For example, we could use the average signed and ordered difference, like in your code:

which produces

>>> seq = [[1, 2, 30], [1, 2, 40], [1, 2, 50], [1, 3, 4], [1, 3, 6], [2, 10, 5]] 
>>> average_over_xy(seq, average_difference)
[[1, 2, 13.333333333333334], [1, 3, 2.0], [2, 10, 0]]

Note that the way you've defined it, which I've matched above, the answer depends upon the order that the elements are given in, ie

>>> average_over_xy([[1,2,3],[1,2,4]], average_difference)
[[1, 2, 1.0]]
>>> average_over_xy([[1,2,4],[1,2,3]], average_difference)
[[1, 2, -1.0]]

If you wanted to, you could use

def average_difference_sorted(seq):
    return average_difference(sorted(seq))

instead or use a standard deviation or whatever you like. (You didn't mention your use case, so I'll assume that you've got the list in the order you want, you're aware of the pitfalls, and you really need average_difference ).

There are some faster numpy -based tricks we could do, and ways to generalize it, but using a defaultdict to accumulate values is a handy pattern, and it's often fast enough.

Answer 2

Here is a possible solution:

l=[[6.0, 270.0, -55.845848680633168],
[6.0, 315.0, -47.572000492889323],
[6.5, 0.0, -47.806802767243724],
[6.0, 180.0, -53.266356523649002],
[6.0, 225.0, -47.872632929518339],
[6.0, 270.0, -52.09662072002746],
[6.0, 315.0, -48.563996448937075]]

# First, we change the structure so that the pair of coordinates
# becomes a tuple which can be used as dictionary key
l=[[(c1, c2), val] for c1, c2, val in l]

# We build a dictionary coord:[...list of values...]
d={}
for coord, val in l:
    d.setdefault(coord,[]).append(val)

# Here, I compute the mean of each list of values.
# Apply your own function !

means = [[coord[0], coord[1], sum(vals)/len(vals)] for coord, vals in d.items()]

print means

Answer 3

You haven't given all of the information necessary to be sure of this, but I believe your error is caused by performing logical operations on numpy arrays. See this answer to a question with a similar error.

Without more information, it's difficult to duplicate the context of your question to try it, but perhaps being more specific in the boolean operations in the if statement will help.

Checking if a value is already in a list

Question

3 answers

solution1
3 ACCPTED 2013-04-20 17:46:25

solution2
1 2013-04-20 18:25:15

solution3
0 2013-04-20 17:39:57

Checking if a value is already in a list

Question

3 answers

solution1 3 ACCPTED 2013-04-20 17:46:25

solution2 1 2013-04-20 18:25:15

solution3 0 2013-04-20 17:39:57

solution1
3 ACCPTED 2013-04-20 17:46:25

solution2
1 2013-04-20 18:25:15

solution3
0 2013-04-20 17:39:57