简体   繁体   English

加速 Astropy 中的计算

[英]Speed up calculation in Astropy

I am trying to calculate the sum of distances between a list of points using astropy.我正在尝试使用 astropy 计算点列表之间的距离总和。 However my implementation is too slow to by implemented with my data, this is one example of my code:但是,我的实现速度太慢,无法通过我的数据实现,这是我的代码示例:

import pandas as pd
import numpy as np  

# synthetic data
lst2 = list(range(50))
lst2 = np.array(lst2)/50
lst3 = np.array(lst2)/51

df = pd.DataFrame(list(zip(lst2, lst3)),
               columns =['A', 'B'])

# Sum of the distance between different points
def Sum(df_frame):
    length = len(df_frame) #Size of "for" loops
    Sum = 0 
    for i in range(length - 1): 
        for j in range(i+1,length):
            c1 = SkyCoord(df_frame['A'].iloc[i]*u.deg, df_frame['A'].iloc[i]*u.deg, frame='icrs')
            c2 = SkyCoord(df_frame['B'].iloc[j]*u.deg, df_frame['B'].iloc[j]*u.deg, frame='icrs') 
            angle = c1.separation(c2).deg
            Sum += angle
    return  Sum

Sum(df)

Does anyone know how to increase the computational speed of this code?有谁知道如何提高这段代码的计算速度?

My real data has millions of points.我的真实数据有百万分。

You should know some times using ready products is faster since all the tools are available.您应该知道有时使用现成的产品会更快,因为所有工具都可用。 However in some condition, as yours, using ready product makes you slower in execution time.但是,在某些情况下,如您的情况,使用现成的产品会使您的执行时间变慢。

In Your code you're creating在您创建的代码中

  1. a unit object which would be your angles.一个单位对象,这将是你的角度。
  2. a SkyCoord object which is your celestial body's coordinates SkyCoord 对象,它是您天体的坐标

Then you just calculate the distance between them using separation .然后您只需使用separation计算它们之间的距离。 These objects are more powerful then what you're using for and that's why they're slower.这些对象比你使用的更强大,这就是为什么它们更慢。

Now we know one can calculate angular separation using:现在我们知道可以使用以下方法计算角距:

arccos(sin(delta1) * sin(delta2) + cos(delta1) * cos(delta2) * sin(alpha1 - alpha2))

See: https://en.wikipedia.org/wiki/Angular_distance请参阅: https : //en.wikipedia.org/wiki/Angular_distance

Now you can implement it.现在您可以实施它。 Just don't forget your angles are in degrees and trigonometric functions accepts angles in radians只是不要忘记您的角度以degrees radians ,三角函数接受以radians角度

def my_sum(df_frame):
    length = len(df_frame)  # Size of "for" loops
    Sum = 0
    df_frame_rad = np.deg2rad(df_frame)
    for i in range(length - 1):
        for j in range(i + 1, length):
            # print(a2, d2)
            dist = np.rad2deg(
                np.arccos(
                    np.sin(df_frame_rad['A'].iloc[i]) * np.sin(df_frame_rad['B'].iloc[j]) + \
                    np.cos(df_frame_rad['A'].iloc[i]) * np.cos(df_frame_rad['B'].iloc[j]) * \
                    np.cos(df_frame_rad['A'].iloc[i] - df_frame_rad['B'].iloc[j])
                )
            )
            Sum += dist
    return Sum

For same data set, the results are:对于相同的数据集,结果为:

Astropy Function: 533.3069727968582天文函数: 533.3069727968582

Pure math function: 533.3069727982754纯数学函数: 533.3069727982754

Not bad.不错。

Astropy Function took, 2.932075262069702 sec to finish Astropy Function 耗时, 2.932075262069702 sec完成

Pure math function took: 0.07899618148803711 sec to finish纯数学函数花了: 0.07899618148803711 sec完成

This answer still going to be incredibly slow especially on a large dataframe, since you have a double-loop indexing the dataframe like df['A'].loc[i] for every O(n^2) pair of elements.这个答案仍然会非常慢,尤其是在大型数据帧上,因为对于每个 O(n^2) 元素对,您都有一个双循环索引数据帧,如df['A'].loc[i]

I tried this with a dataframe containing just 1000 elements in each column and it took ages.我用每列中仅包含 1000 个元素的数据框进行了尝试,这花了很长时间。 For larger numbers I just gave up waiting.对于更大的数字,我只是放弃了等待。 It speeds up considerably if you instead pass the columns to the function as normal numpy arrays, and then before performing the distance calculation also assign A_i = A[i]; B_j = B[j]如果您将列作为普通 numpy 数组传递给函数,然后在执行距离计算之前还分配A_i = A[i]; B_j = B[j] A_i = A[i]; B_j = B[j] , ie like: A_i = A[i]; B_j = B[j] ,即:

Using pure NumPy使用纯 NumPy

def my_sum2(A, B):
    length = len(A)  # Size of "for" loops
    assert length == len(B)
    Sum = 0
    A = np.deg2rad(np.asarray(A))
    B = np.deg2rad(np.asarray(B))
    for i in range(length - 1):
        for j in range(i + 1, length):
            # print(a2, d2)
            A_i = A[i]
            B_j = B[j]
            dist = np.rad2deg(
                np.arccos(
                    np.sin(A_i) * np.sin(B_j) + \
                    np.cos(A_i) * np.cos(B_j) * \
                    np.cos(A_i - B_j)
                )
            )
            Sum += dist
    return Sum

For 100 elements I got:对于 100 个元素,我得到了:

>>> %timeit my_sum(df)
229 ms ± 3.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit my_sum2(df['A'], df['B'])
41.1 ms ± 2.88 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

But you can do much better still by pre-computing the sines and cosines using vectorized operations.但是,通过使用矢量化运算预先计算正弦和余弦,您仍然可以做得更好。 This has a consequence of greater memory usage, at a tradeoff of speed (we could also build a matrix cos_A_B = np.cos(A[:, np.newaxis] - B) for the cos(A[i] - B[j]) factor, but this would be prohibitively memory-exhaustive if A and B are large):这会导致内存使用量增加,以cos_A_B = np.cos(A[:, np.newaxis] - B)速度(我们也可以为cos(A[i] - B[j]) cos_A_B = np.cos(A[:, np.newaxis] - B)构建矩阵cos_A_B = np.cos(A[:, np.newaxis] - B) cos(A[i] - B[j])因素,但如果 A 和 B 很大,这将非常消耗内存):

def my_sum3(A, B):
    length = len(A)  # Size of "for" loops
    assert length == len(B)
    Sum = 0
    A = np.deg2rad(np.asarray(A))
    B = np.deg2rad(np.asarray(B))
    cos_A = np.cos(A)
    sin_A = np.sin(A)
    cos_B = np.cos(B)
    sin_B = np.sin(B)

    for i in range(length - 1):
        for j in range(i + 1, length):
            # print(a2, d2)
            dist = np.rad2deg(
                np.arccos(
                    sin_A[i] * sin_B[j] + \
                    cos_A[i] * cos_B[j] * \
                    np.cos(A[i] - B[j])
                )
            )
            Sum += dist
    return Sum
>>> %timeit my_sum3(df['A'], df['B'])
20.2 ms ± 715 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

But for pairwise calculations with NumPy arrays we can take further advantage of NumPy's element-wise broadcasting in order to eliminate the inner for-loop completely:但是对于 NumPy 数组的成对计算,我们可以进一步利用 NumPy 的逐元素广播,以完全消除内部 for 循环:

def my_sum4(A, B):
    length = len(A)  # Size of "for" loops
    assert length == len(B)
    Sum = 0
    A = np.deg2rad(np.asarray(A))
    B = np.deg2rad(np.asarray(B))
    cos_A = np.cos(A)
    sin_A = np.sin(A)
    cos_B = np.cos(B)
    sin_B = np.sin(B)
    
    for i in range(length - 1):
        Sum += np.sum(np.rad2deg(np.arccos(
            sin_A[i] * sin_B[i + 1:] +
            cos_A[i] * cos_B[i + 1:] *
            np.cos(A[i] - B[i + 1:]))))

    return Sum
>>> %timeit my_sum4(df['A'], df['B'])
1.31 ms ± 71.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

There are many many other ways this could be micro-optimized, using Cython, scipy, etc. but I won't spend any more time on it here.还有许多其他方法可以微优化,使用 Cython、scipy 等,但我不会在这里花更多时间。

The other problem with this approach is it's specifically geared to the detail of the OP's question where each coordinate has identical RA and DEC for some reason, and is not generalized.这种方法的另一个问题是它专门针对 OP 问题的细节,其中每个坐标由于某种原因具有相同的 RA 和 DEC,并且没有概括。

Using SkyCoord使用 SkyCoord

Something Astropy beginners often miss about the SkyCoord class (and many other classes in Astropy) is that a single SkyCoord can be a container for an array of coordinates, not just a single coordinate.对于SkyCoord类(以及SkyCoord许多其他类),Astropy 初学者经常错过的SkyCoord是,单个SkyCoord可以是坐标数组的容器,而不仅仅是单个坐标。

In the OP's question they are creating millions of SkyCoord objects, one for each coordinate.在 OP 的问题中,他们创建了数百万个SkyCoord对象,每个坐标一个。 In fact you could simply do this:事实上,你可以简单地这样做:

>>> c1 = SkyCoord(df['A']*u.deg, df['A']*u.deg, frame='icrs')
>>> c2 = SkyCoord(df['B']*u.deg, df['B']*u.deg, frame='icrs')

Methods like SkyCoord.separation also work element-wise just like other functions on NumPy arrays: SkyCoord.separation方法也像 NumPy 数组上的其他函数一样SkyCoord.separation元素工作:

>>> c1.separation(c2)
<Angle [0.0130013 , 1.18683992, 0.82050812, ...] deg>

So for every pair-wise separation you could use a similar technique as in my my_sum4 solution, freeing you from having to write the calculation yourself:因此,对于每个成对分离,您可以使用与my_sum4解决方案类似的技术,使您不必自己编写计算:

def my_sum5(c1, c2):
    angle_sum = 0
    for idx in range(len(c1)):
        angle_sum += c1[idx].separation(c2[idx + 1:]).sum()
    return angle_sum
>>> my_sum5(c1, c2)
<Angle 2368.14558945 deg>

This is admittedly considerably slower than the last pure-NumPy solution:诚然,这比上一个纯 NumPy 解决方案得多:

>>> %timeit my_sum5(c1, c2)
166 ms ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

This overhead is the cost of some of Astropy's high-level interfaces, and I agree with MSH where they wrote in their answer:这个开销是 Astropy 的一些高级接口的成本,我同意 MSH 在他们的回答中写道:

You should know some times using ready products is faster since all the tools are available.您应该知道有时使用现成的产品会更快,因为所有工具都可用。 However in some condition, as yours, using ready product makes you slower in execution time.但是,在某些情况下,如您的情况,使用现成的产品会使您的执行时间变慢。

That is, if you really have high-performance needs for large datasets, it still might be better to use a hand-optimized solution.也就是说,如果您确实对大型数据集有高性能需求,那么使用手动优化的解决方案可能会更好。

However, we can still do a little better within just Astropy.但是,我们仍然可以在 Astropy 中做得更好。 If you look at the source code for SkyCoord.separation we see that it's little more than a higher-level interface to a function called angular_separation which computes the separation using a somewhat more computationally expensive Vincenty formula, using the lat/lon components of the coord's spherical representation.如果您查看SkyCoord.separation的源代码,我们会发现它只不过是一个名为angular_separation的函数的更高级别的接口,该函数使用计算量更大的 Vincenty 公式计算分离,使用坐标的纬度/经度分量球形表示。

For a computation like this you can eliminate a lot of overhead (like Astropy's automatical coordinate conversion) while using this function directly like:对于这样的计算,您可以在直接使用此函数的同时消除大量开销(如 Astropy 的自动坐标转换),例如:

def my_sum6(c1, c2):
    angle_sum = 0
    lon1 = c1.spherical.lon.to(u.rad).value
    lat1 = c1.spherical.lat.to(u.rad).value
    lon2 = c2.spherical.lon.to(u.rad).value
    lat2 = c2.spherical.lat.to(u.rad).value
    
    for idx in range(len(c1)):
        angle_sum += angular_separation(lon1[idx], lat1[idx], lon2[idx+1:], lat2[idx+1:]).sum()
    return np.rad2deg(angle_sum)

This is basically doing what SkyCoord.separation is doing, but it's pre-computing the lat/lon arrays for both coordinates and converting them to radians first, then calling angular_separation on them.这基本上是在做SkyCoord.separation正在做的事情,但它预先计算了两个坐标的纬度/经度数组,并首先将它们转换为弧度,然后对它们调用angular_separation It also skips the overhead of assering that both coordinates are in the same frame (they are both ICRS in this case so we are assuming they are).它还跳过了断言两个坐标在同一帧中的开销(在这种情况下它们都是 ICRS,所以我们假设它们是)。 This performs almost as well as my_sum4 :这几乎和my_sum4一样好:

>>> %timeit my_sum6(c1, c2)
2.26 ms ± 123 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In fact, in this case the main thing making it slower than my_sum4 is just the increased complexity of the Vincenty formula used, and the fact that it's more generalized (not assuming that RA == DEC for each coordinate).事实上,在这种情况下,使它比my_sum4慢的主要my_sum4只是所使用的 Vincenty 公式的复杂性增加,以及它更通用的事实(不假设每个坐标的 RA == DEC)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM