简体   繁体   English

在Python中实现3D向量:numpy vs x,y,z字段

[英]Implementing 3D vectors in Python: numpy vs x,y,z fields

I am implementing a 3D Vector class in Python. 我正在用Python实现3D Vector类。 My vector has coordinates x, y and z (all floats), and I need to decide how to store this information. 我的向量有坐标x,y和z(所有浮点数),我需要决定如何存储这些信息。 I can see at least three options here: 我在这里至少可以看到三个选项:

1) Make three separate float fields: self.x, self.y, self.z 1)制作三个独立的浮点字段:self.x,self.y,self.z

class Vector:

  def __init__(self, x, y, z):
    self.x = x
    self.y = y
    self.z = z

2) Make a list, say self.data, with three elements. 2)制作一个列表,比如说self.data,有三个元素。 I may also use a tuple if the objects can be constant. 如果对象可以是常量,我也可以使用元组。

class Vector:

  def __init__(self, x, y, z):
    self.data = [x,y,z]

3) Make a numpy array, say self.data, with three elements. 3)制作一个numpy数组,比如self.data,有三个元素。

import numpy as np    

class Vector:

  def __init__(self, x, y, z):
    self.data = np.array([x,y,z])

For options (2) and (3), I could then implement properties and setters to access the single coordinates 对于选项(2)和(3),我可以实现属性和设置器来访问单个坐标

@property
def x(self):
  return self.data[0]

4) Why not having some redundancy? 4)为什么不进行冗余? I could have both a list (or tuple, or numpy array) and separate fields x, y and z. 我可以同时拥有一个列表(或元组或numpy数组)和单独的字段x,y和z。

The class is meant to be used to perform common operations such as vector addition, inner product, cross product, rotation, etc. Performance of these operations needs to be taken into account. 该类用于执行常见操作,例如向量添加,内积,叉积,旋转等。需要考虑这些操作的性能。

Is there a solution that I should prefer, and why? 是否有我更喜欢的解决方案,为什么?

There are different aspects to this question and I can give you some hints on how these could be resolved. 这个问题有不同的方面,我可以给你一些关于如何解决这些问题的提示。 Note that these are meant as suggestions, you definetly need to see which one you like most. 请注意,这些都是建议,你肯定需要看看你最喜欢哪一个。

Supporting linear algebra 支持线性代数

You mentioned that you want to support linear algebra, such as vector addition (element-wise addition), cross product and inner product. 您提到要支持线性代数,例如向量加法(逐元素加法),交叉积和内积。 These are avaiable for numpy.ndarray s so you could choose different approaches to supporting them: 这些可用于numpy.ndarray因此您可以选择不同的方法来支持它们:

  1. Simply use a numpy.ndarray and don't bother about your own class: 只需使用numpy.ndarray ,不要为自己的课程烦恼:

     import numpy as np vector1, vector2 = np.array([1, 2, 3]), np.array([3, 2, 1]) np.add(vector1, vector2) # vector addition np.cross(vector1, vector2) # cross product np.inner(vector1, vector2) # inner product 

    There's no builtin vector rotation defined in numpy but there are several sources avaiable, for example "Rotation of 3D vector" . numpy没有定义内置向量旋转,但有几个可用的源,例如“3D向量的旋转” So you would need to implement it yourself. 所以你需要自己实现它。

  2. You can create a class, independant of _how you store your attributes and provide an __array__ method. 您可以创建_how您存储属性的一类,独立的,并提供__array__方法。 That way you can support (all) numpy functions as if your instances were numpy.ndarray s themselves: 这样你就可以支持(所有)numpy函数,就像你的实例是numpy.ndarray

     class VectorArrayInterface(object): def __init__(self, x, y, z): self.x, self.y, self.z = x, y, z def __array__(self, dtype=None): if dtype: return np.array([self.x, self.y, self.z], dtype=dtype) else: return np.array([self.x, self.y, self.z]) vector1, vector2 = VectorArrayInterface(1, 2, 3), VectorArrayInterface(3, 2, 1) np.add(vector1, vector2) # vector addition np.cross(vector1, vector2) # cross product np.inner(vector1, vector2) # inner product 

    This will return the same results as in the first case so you can provide an interface for numpy functions without having a numpy-array. 这将返回与第一种情况相同的结果,因此您可以为numpy函数提供一个接口,而无需使用numpy-array。 If you have a numpy-array stored in your class the __array__ method can simply return it so this could be an argument for storing your x , y and z as numpy.ndarray internally (because that's basically "for free"). 如果你的类中存储了一个numpy-array,那么__array__方法可以简单地返回它,这样就可以将你的xyz存储为内部的numpy.ndarray (因为那基本上是“免费的”)。

  3. You can subclass np.ndarray . 你可以np.ndarray I won't go into the details here because that's an advanced topic that could easily justify a whole answer by itself. 我不会在这里详细介绍,因为这是一个高级主题,可以很容易地证明整个答案本身。 If you really consider this then you should have a look at the official documentation for "Subclassing ndarray" . 如果您真的考虑到这一点,那么您应该查看“Subclassing ndarray”的官方文档。 I don't recommend it, I worked on several classes that do subclass np.ndarray and there are several "rough egdes" down that path. 我不推荐它,我参与了几个子类np.ndarray类,并且在np.ndarray有几个“粗糙的egdes”。

  4. You can implement the operations you need yourself. 您可以自己实施所需的操作。 That's reinventing the wheel but it's educational and fun - if there's only a handful of them. 这是重新发明的轮子,但它具有教育性和趣味性 - 如果它们只有少数几个。 I wouldn't recommend this for serious production because here as well are several "rough edges" that have been adressed in the numpy functions already. 我不推荐这个用于严肃的制作,因为这里也有几个已经在numpy函数中得到解决的“粗糙边缘”。 For example overflow or underflow issues, correctness of the functions, ... 例如溢出或下溢问题,功能的正确性,......

    A possible implementation (not including rotation) could look like this (this time with an internally stored list): 可能的实现(不包括旋转)可能如下所示(这次是内部存储的列表):

     class VectorList(object): def __init__(self, x, y, z): self.vec = [x, y, z] def __repr__(self): return '{self.__class__.__name__}(x={self.vec[0]}, y={self.vec[1]}, z={self.vec[2]})'.format(self=self) def __add__(self, other): x1, y1, z1 = self.vec x2, y2, z2 = other.vec return VectorList(x1+x2, y1+y2, z1+z2) def crossproduct(self, other): x1, y1, z1 = self.vec x2, y2, z2 = other.vec return VectorList(y1*z2 - z1*y2, z1*x2 - x1*z2, x1*y2 - y1*x1) def scalarproduct(self, other): x1, y1, z1 = self.vec x2, y2, z2 = other.vec return x1*x2 + y1*y2 + z1*z2 

    Note: You can implement these can-coded methods and implement the __array__ method I mentioned earlier. 注意:您可以实现这些可编码的方法并实现我之前提到的__array__方法。 That way you can support any function expecting a numpy.ndarray and also have your homegrown methods. 这样你就可以支持任何期望numpy.ndarray函数,也可以使用你自己开发的方法。 These approaches are not exclusive but you will have different results, the methods above return a scalar or a Vector but if you go through __array__ you'll get numpy.ndarray s back. 这些方法并不相互排斥,但你会有不同的结果,上述回报方法的标量或Vector ,但如果你通过__array__你会得到numpy.ndarray的背部。

  5. Use a library containing a 3D vector. 使用包含3D矢量的库。 In some sense this is the easiest way in other aspects it could be very complicated. 从某种意义上说,这是其他方面最简单的方法,它可能非常复杂。 On the plus side an existing class will probably work out of the box and it's probably optimized in terms of performance. 从好的方面来说,现有的类可能是开箱即用的,它可能在性能方面进行了优化。 On the other hand you need to find an implementation that supports your use-case, you need to read the documentation (or figure out how it works by other means) and you could hit bugs or limitations that turn out to be desatrous for your project. 另一方面,您需要找到一个支持您的用例的实现,您需要阅读文档(或通过其他方式弄清楚它是如何工作的),并且您可能会遇到对您的项目来说非常糟糕的错误或限制。 Ah, and you get an additional dependency and you need to check if the license is compatible with your project. 啊,你得到一个额外的依赖,你需要检查许可证是否与您的项目兼容。 Additionally if you copy the implementation (check if the license allows that!) you need to maintain (even if it's just sync'ing) foreign code. 另外,如果您复制实现(检查许可证是否允许!),您需要维护(即使它只是同步)外部代码。

Performance 性能

Performance is tricky in this case, the mentioned use-cases are quite simple and each task should be of the order of microseconds - so you should be able to perform several thousand to million operations per second already. 在这种情况下,性能很棘手,所提到的用例非常简单,每个任务应该是微秒级 - 所以你应该能够每秒执行几千到几百万次操作。 Assuming you don't introduce an unnecessary bottleneck! 假设你没有引入不必要的瓶颈! However you can micro-optimize the operations. 但是,您可以微观优化操作。

Let me start with some general tipps: 让我从一些一般的tipps开始:

  • Avoid numpy.ndarray <-> list / float operations. 避免使用numpy.ndarray < - > list / float操作。 These are costly! 这些都很贵! If most of the operations use numpy.ndarray s you don't want to store your values in a list or as seperate attributes. 如果大多数操作使用numpy.ndarray ,则不希望将值存储在列表中或作为单独的属性存储。 Likewise if you want to access the individual values of the Vector or iterate over these values or perform operations on them as list then store them as list or seperate attributes. 同样,如果要访问Vector的各个值或迭代这些值或对它们执行操作作为list则将它们存储为列表或单独的属性。

  • Using numpy to operate on three values is relativly inefficient. 使用numpy对三个值进行操作相对低效。 numpy.ndarray is great for big array because it can store the values more efficiently (space) and scales much better than pure-python operations. numpy.ndarray非常适合大数组,因为它可以更有效地存储值(空间)并且比纯python操作更好地扩展。 However these advantages have some overhead that is significant for small arrays (say length << 100 , that's an educated guess, not a fixed number!). 然而,这些优点有一些对小阵列很重要的开销(比如length << 100 ,这是一个有根据的猜测,而不是一个固定的数字!)。 A python solution (I use the one I already presented above) can be much faster than a numpy solution for such small arrays: python解决方案(我使用上面已经介绍过的解决方案)可以比这种小型数组的numpy解决方案快得多:

     class VectorArray: def __init__(self, x, y, z): self.data = np.array([x,y,z]) # addition: python solution 3 times faster %timeit VectorList(1, 2, 3) + VectorList(3, 2, 1) # 100000 loops, best of 3: 9.48 µs per loop %timeit VectorArray(1, 2, 3).data + VectorArray(3, 2, 1).data # 10000 loops, best of 3: 35.6 µs per loop # cross product: python solution 16 times faster v = Vector(1, 2, 3) a = np.array([1,2,3]) # using a plain array to avoid the class-overhead %timeit v.crossproduct(v) # 100000 loops, best of 3: 5.27 µs per loop %timeit np.cross(a, a) # 10000 loops, best of 3: 84.9 µs per loop # inner product: python solution 4 times faster %timeit v.scalarproduct(v) # 1000000 loops, best of 3: 1.3 µs per loop %timeit np.inner(a, a) # 100000 loops, best of 3: 5.11 µs per loop 

    However like I said these timings are of the order of microseconds so this is is literally micro-optimizing. 但是就像我说的那样,这些时间是微秒级,所以这就是微观优化。 However if your focus is on optimal performance of your class then you can be faster with pure-python and self-implemented functions. 但是,如果您专注于课程的最佳表现,那么使用纯python和自我实现的功能可以更快。

    As soon as you try to do a lot of linear algebra operations you should leverage numpys vectorized operations. 一旦尝试进行大量线性代数运算,就应该利用numpys向量化运算。 Most of these are incompatible with a class such as you describe and a completly different approach might be appropriate: For example a class that stores an array of array-vectors (a multidimensional array) in a way that interfaces correctly with numpys functions! 其中大多数与您描述的类不兼容,并且完全不同的方法可能是合适的:例如,以与numpys函数正确接口的方式存储数组向量数组(多维数组)的类! But I think that's out of scope for this answer and wouldn't really answer your question which was limited to a class only storing 3 values. 但我认为这个答案超出了范围,并且不会真正回答你的问题,这个问题仅限于只存储3个值的类。

  • I did some benchmarks using the same method with different approaches but that's a bit cheating. 我用不同的方法使用相同的方法做了一些基准测试,但这有点作弊。 In general you shouldn't time one function call, you should measure the execution time of a program . 通常,您不应该为一个函数调用计时, 您应该测量程序的执行时间 In programs a tiny speed difference in a function that is called millions of times can make a much bigger overall difference than a big speed difference in a method that is only called a few times.... or not! 在程序中,被称为数百万次的函数中的微小速度差异可以比仅仅被称为几次的方法中的大速度差异产生更大的整体差异....或者不是! I can only provide timings for functions because you haven't shared your program or use-cases so you need to find out which approach works best (correctness and performance) for you. 我只能为函数提供时间,因为您没有共享程序或用例,因此您需要找出哪种方法最适合您(正确性和性能)。

Conclusion 结论

There are several other factors to consider which approach would be best, but these are more "meta"-reasons, not directly related to your program. 还有其他几个因素需要考虑哪种方法最好,但这些因素更多是“元”因素,与您的计划没有直接关系。

  • Re-inventing the wheel (implementing the functions yourself) is an opportunity to learn. 重新发明轮子(自己实现功能)是一个学习的机会。 You need to make sure it works correctly, you can time it and if it's too slow you can try different ways to optimize it. 你需要确保它正常工作,你可以计时,如果它太慢,你可以尝试不同的方法来优化它。 You start thinking about algorithmic complexities, constant factors, correctness, ... instead of thinking about "which function will solve my issue" or "how do I make that numpy-function correctly solve my issue". 你开始考虑算法复杂性,常数因素,正确性......而不是考虑“哪个函数将解决我的问题”或“我如何使numpy函数正确解决我的问题”。

  • Using NumPy for length-3 arrays is probably like "shooting with cannons at flies" but it is a great opportunity to become more familiar with numpy functions and in the future you'll know more about how NumPy works (vectorization, indexing, broadcasting, ...) even if NumPy wouldn't be a good fit for this question and answer. 使用NumPy进行长度为3的阵列可能就像“在苍蝇中用大炮射击”,但这是一个很熟悉numpy功能的好机会,将来你会更多地了解NumPy如何工作(矢量化,索引,广播, ...)即使NumPy不适合这个问题和答案。

  • Try different approaches and see how far you get. 尝试不同的方法,看看你有多远。 I learned a lot while answering this question and it was fun to try the approaches - compare the results for discrepancies, timing the method calls and evaluating their limitations! 在回答这个问题时学到了很多东西,尝试这些方法很有趣 - 比较差异的结果,调整方法调用的时间并评估它们的局限性!

Taking into considerations of the use of the class Vector , I'd prefer having option-3. 考虑到使用Vector类,我更喜欢选项-3。 Since it yields numpy arrays, the vector operations are relatively easy, intuitive, and fast by using numpy. 由于它产生numpy数组,因此通过使用numpy,向量操作相对简单,直观且快速。

In [81]: v1 = Vector(1.0, 2.0, 3.0)

In [82]: v2 = Vector(0.0, 1.0, 2.0)

In [83]: v1.data + v2.data
Out[83]: array([1.0, 3.0, 5.0])

In [85]: np.inner(v1.data, v2.data)
Out[85]: 8.0

These operations are already well optimized in numpy for performance. 这些操作已经在numpy性能方面得到了很好的优化。

If a simple vector type behavior is your aim, definitely stick with the pure numpy solution. 如果一个简单的矢量类型行为是你的目标,绝对坚持纯粹的numpy解决方案。 There are many reasons for this: 这件事情是由很多原因导致的:

  • numpy already has out of the box solutions for all of the basic behavior your describe (cross products and more) numpy已经为您描述的所有基本行为提供开箱即用的解决方案(跨产品等)
  • it will be faster by leaps and bounds for arrays of appreciable size (ie, where it matters) 对于具有可观大小的阵列(即重要的地方)而言,它将更快地实现跨越式发展
  • the vectorized / array syntax tends to be a lot more compact and expressive, once you get used to it / experienced with it 一旦你习惯了它/经历过它,矢量化/数组语法往往会更加紧凑和富有表现力
  • and most importantly; 最重要的是; the entire numpy/scipy ecosystem is built around the interface provided by the ndarray; 整个numpy / scipy生态系统是围绕ndarray提供的接口构建的; all libraries speak the common language of the ndarray; 所有图书馆都说ndarray的共同语言; interfacing with them with your custom vector type is entering a world of pain. 使用您的自定义矢量类型与它们进行交互正在进入一个痛苦的世界。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM