简体   繁体   English

对地理坐标数据集进行分组/分组

[英]Binning/grouping a dataset of geographic coordinates

I have a large dataset with two columns: timestamp and lat/lon . 我有一个包含两列的大型数据集: timestamplat / lon I want to group the coordinates in someway to determine the number of different places that are recorded, treating everything within a certain distance of each other as all one location. 我想以某种方式将坐标分组以确定记录的不同位置的数量,并将彼此之间一定距离内的所有内容都视为一个位置。 Essentially I want to figure out how many different "places" are in this dataset. 本质上,我想弄清楚此数据集中有多少个不同的“地方”。 A good visual example is this I'd like to wind up here, but I do not know where the clusters are with my dataset. 我想在这里结束一个很好的视觉示例 ,但是我不知道数据集的聚类在哪里。

Detailing more on behzad.nouri's reference 详细介绍behzad.nouri的参考

# X= your Geo Array

# Standardize features by removing the mean and scaling to unit variance
X = StandardScaler().fit_transform(X)

# Compute DBSCAN
db = DBSCAN(eps=0.3, min_samples=3).fit(X)

# HERE
# eps -- The maximum distance between two samples 
#  for them to be considered as in the same neighborhood.
# min_samples -- The number of samples in a neighborhood for a point 
#  to be considered as a core point.

core_samples = db.core_sample_indices_
labels = db.labels_

# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)

This psuedo code demonstrates how to reduce a set of points to a single point per grid partition while tallying the number of points in the grid partition. 该伪代码演示了如何在每个网格分区中将一组点减少到单个点,同时计算网格分区中的点数。 This can be useful if you have a set of points where some areas are sparse and others are dense, but want an even distribution of displayed points (such as on a map). 如果您拥有一些区域稀疏而其他区域密集但又希望显示的点分布均匀的点(例如在地图上),这将很有用。

To use the function, one passes the set of points and the number of partitions across one of the axis (eg, X). 要使用该功能,可以在一组轴(例如X)上传递一组点和分区数。 The same partitioning will be used on the other axis (eg, Y). 在另一个轴(例如Y)上将使用相同的分区。 So if one specified 3, then 9 (3*3) equal sized partitions would be made. 因此,如果指定一个3,则将创建9(3 * 3)个相等大小的分区。 The function first goes through the set of points to find the outermost X and Y (min and max) coordinates that bound the entire set. 该函数首先遍历点集合,以找到限制整个集合的最外部的X和Y(最小和最大)坐标。 The distance between the outermost X and Y axis is then divided by the number of partitions to determine the grid size. 然后将最外侧的X轴和Y轴之间的距离除以分隔数,以确定栅格大小。

The function then steps through each grid partition and checks each point in the set whether it is within the grid partition. 然后,该函数逐步遍历每个网格分区,并检查集合中的每个点是否在网格分区内。 If the point is within the grid partition, it checks if this is the first point encountered in the grid partition. 如果该点在网格分区内,则检查这是否是网格分区中遇到的第一个点。 If yes, a flag is set to indicate that the first point has been found. 如果是,则设置标志以指示已找到第一个点。 Otherwise, not the first point in the grid partition, the point is removed from the set of points. 否则,该点(而不是网格分区中的第一个点)将从点集中删除。

For each point that is found in the partition, the function increments a tally count. 对于在分区中找到的每个点,该函数都会增加计数计数。 Finally, when the reduction/tallying is completed per grid partition, one can then visualize the tallied point (eg, show marker on map at the single point with a tally indicator): 最后,当每个网格分区的缩小/计数完成时,然后可以可视化该计数点(例如,使用计数指示器在地图上的单个点上显示标记):

function TallyPoints( array points, int npartitions )
{
    array partition = new Array();

    int max_x = 0, max_y = 0;
    int min_x = MAX_INT, min_y = MAX_INT

    // Find the bounding box of the points
    foreach point in points
    {
        if ( point.X > max_x )
            max_x = point.X;
        if ( point.Y < min_x )
            min_x = point.X;
        if ( point.Y > max_y )
            max_y = point.Y;
        if ( point.Y < min_y )
            min_y = point.Y;
    }

    // Get the X and Y axis lengths of the paritions
    float partition_length_x =  ( ( float ) ( max_x - min_x ) ) / npartitions;
    float partition_length_y =  ( ( float ) ( max_y - min_y ) ) / npartitions;

    // Reduce the points to one point in each grid partition
    // grid partition
    for ( int n = 0; n < npartitions; n++ )
    {
        // Get the boundary of this grid paritition
        int min_X = min_x + ( n * partition_length_x );
        int min_Y = min_y + ( n * partition_length_y );
        int max_X = min_x + ( ( n + 1 ) * partition_length_x );
        int max_Y = min_y + ( ( n + 1 ) * partition_length_y );

        // reduce and tally points
        int     tally  = 0;
        boolean reduce = false; // set to true after finding the first point in the paritition
        foreach point in points
        {
            // the point is in the grid parition
            if ( point.X >= min_x && point.X < max_x &&
                 point.Y >= min_y && point.X < max_y )
            {
                // first point found
                if ( false == reduce )
                {
                    reduce = true;
                    partition[ n ].point = point;   // keep this as the single point for the grid
                }
                else
                    points.Remove( point ); // remove the point from the list

                // increment the tally count
                tally++;
            }
        }

        // store the tally for the grid
        partition[ n ].tally = tally;

        // visualize the tallied point here (e.g., marker on Google Map)
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM