简体   繁体   中英

Get a subsample of class objects when attributes are lists

I have a question regarding Python classes and couldn't seem to find an easy answer anywhere. So let's say I define a class:

class point(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def calc_mag(self):
        self.mag = np.sqrt(self.x*self.x + self.y*self.y + self.z*self.z)

Now I can easily create a list of objects by doing:

xs = [1,2,3,4,5]
ys = [2,3,4,5,6]
zs = [3,4,5,6,7]
points = []
for i in range(len(xs)):
    pt = point(xs[i], ys[i], zs[i])
    points.append(pt)

and I can get a subsample of these point objects by doing

sub_points = [pt for pt in points if pt.x > 1.0]

This works but the creation part is not very efficient since we are using a loop and not vectorizing. A faster way to do so is simply

points = point(xs, ys, zs)

and when I reference the attribute x, I get a list of values:

in : points.x
out: [1, 2, 3, 4, 5]

My question is, for this class object (which is essentially an object of lists instead of a list of objects), is there a quick way of getting a subsample like the first approach above? I tried a few things like

points[points.x > 1]  # Wrong way of doing it

but since points is not a list it cannot be indexed and raises an error

Of course I could also apply the comparison test and then re-creating objects by filtering through all other attributes, but that again is very inefficient and produces redundant codes.

So does anyone have an idea of how this can be solved?

===================(additional info)==========================

Thanks for everyone who have responded so far. I think maybe I need to clarify things a little here. The class posted above is not the actual class used in my program. I am posting a simplified version so that discussion of the real question can be easier and simpler. The actual class I am using is far larger and more complicated, with more than 40 attributes and methods. With that being said, I would HAVE TO keep things in class to take advantage of the nice features, and using bumpy arrays, pandas data frames, or list comprehensions are simply not an option.

Also, performance is somewhat important, which is why I am creating the class using a vectorized form instead of the list comprehension or a loop. I could be writing it in C/C++ solely for performance, but there are other nice things about Python which makes it beneficial to stick with python for the moment. I could also write a C wrapper for the slowest part to boost performance and bypass this problem, but somehow I just feel that there's got to be a solution for this in Python!

This depends heavily on the application, but something like a numpy array fits the given example well.

import numpy as np

xs = [1,2,3,4,5]
ys = [2,3,4,5,6]
zs = [3,4,5,6,7]
points = np.array([xs, ys, zs]).T  # transpose so rows are points

print(points[points[:, 0] > 1])
# [[2 3 4]
#  [3 4 5]
#  [4 5 6]
#  [5 6 7]]

You can even use struct arrays to keep labels.

points = np.array(
    [p for p in zip(xs, ys, zs)], 
    dtype= {'names': ['x', 'y', 'z'], 'formats': ['i4']*3}  # i4 for ints
)

print(points[points['x'] > 1])
# [(2, 3, 4) (3, 4, 5) (4, 5, 6) (5, 6, 7)]

If you want to keep the same class accessing syntax points.x , you could wrap a numpy array in a class and add attributes that access various columns of the array. See the documentation on subclassing ndarray .

What you are trying to do is called boolean indexing. Numpy arrays support this inherently. You also could consider using the pandas library if you need your arrays to be labeled (think excel tabular data: arrays with row and column labels).

The problem with what you're trying to do is that you'd need your custom object to support boolean indexing, and python objects don't support this. If you absolutely need custom behavior, you can subclass a numpy array and overload its magic methods that control the boolean indexing. Edit: you can also try record arrays as the other solution pointed out.

http://docs.scipy.org/doc/numpy/user/basics.subclassing.html

Here's a solution in pandas. Unlike numpy, it supports attribute indexing.

from pandas import DataFrame
df = DataFrame([[1,2,3], [2,3,4], [3,4,5]], columns=['xs', 'ys', 'zs'])
df

   xs  ys  zs
0   1   2   3
1   2   3   4
2   3   4   5

You can then index on xs

df['xs'] > 1
0    False
1     True
2     True

Name: xs, dtype: bool
df[df['xs'] > 1]
    xs  ys  zs
1   2   3   4
2   3   4   5

There are few problems you raised. The first was creation with comprehension:

from itertools import izip

class point(object):
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
    def __str__(self):
        return 'P({s.x}, {s.y}, {s.z})'.format(s=self)
    def __repr__(self):
        return str(self)

vectors = izip(xs, ys, zs)
points = [point(*vector) for vector in vectors]
print points

if you don't want to use numpy or pandas containers, you can play around with comprehensions or filtering:

print [p for p in points if p.x < 3]
print filter(lambda p: p.x < 3, points)
filt = lambda p: p.x < 3
print filter(filt, points)

Additionally, with modules operator and functools you can make factories for these filters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM