简体   繁体   English

在两个用户之间通过地理位置标记建立位置相关性?

[英]Establish location-affinity between two users from their geotags?

The Idea. 的想法。 I would like to build a function like: 我想建立一个像这样的功能:

location_affinity(user_a, user_b)

which establish a location affinity between two users. 在两个用户之间建立位置关联。 In particular, this function will return a float number between 0 (no affinity) and 1 (max affinity) indicating how much places user_a has been correspond to places user_b has been. 特别是,此函数将返回介于0(无亲和力)和1(最大亲和力)之间的浮点数,该数字指示user_a对应于user_b对应的位置数。 eg: If user_a ALWAYS stays with user_b and follows him to every places he go, I'm expecting a "1" as result. 例如:如果user_a始终与user_b在一起,并跟随他到他去的每个地方,那么我期望结果为“ 1”。 If user_a lives far away from user_b and they never got even close to each other, I'm expecting a "0" as result. 如果user_a远离user_b,并且彼此之间从未相距甚远,那么我期望结果为“ 0”。

The Data. 数据。 Each user has a list of points(latitude, longitude) where he has been, and those points were already extracted from user's Facebook geotags. 每个用户都有一个他曾经去过的点(纬度,经度)列表,这些点已经从用户的Facebook地理标记中提取出来。 To visualize this: IMAGE 可视化: IMAGE

  • Red "X"s are points(lat, lng) user_a has been. 红色的“ X”是user_a的点(纬度,经度)。
  • Green "X"s are points(lat, lng) user_b has been. 绿色的“ X”是user_b已获得的点(纬度,经度)。
  • Blue area represent the overlap. 蓝色区域表示重叠。

The Question. 问题。 Are there any known algorithms which, based on two users' map points list, can establish the affinity (which I gather it depends on the overlap area)? 是否有任何已知的算法可以基于两个用户的地图点列表来建立亲和力(我收集的亲和力取决于重叠区域)? If not, which keywords should I search for? 如果没有,我应该搜索哪些关键词?

Additional. 额外。 I'm trying to build Python functions with Spark. 我正在尝试使用Spark构建Python函数。 Are there any integrations? 有没有集成?

Thank you. 谢谢。

How about something like this: 这样的事情怎么样:

First we use scipy.spatial.distance.cdist to determine the distances between each point from user_a to each point from user_b to find the closest point for each. 首先,我们使用scipy.spatial.distance.cdist来确定user_a中每个点到user_b每个点之间的距离,以找到每个点的最近点。 We then use the exponential function to exponentially suppress higher distances. 然后,我们使用指数函数以指数方式抑制更大的距离。 The constant c determines how large this suppression is, smaller means large distances have a higher suppression (you will need to scale it to make sense in your actual units). 常数c确定此抑制的大小,较小的值表示长距离具有较高的抑制(您需要按比例缩放它以使实际单位有意义)。 Then we just look at the mean of that metric. 然后,我们仅查看该指标的平均值。

import numpy as np
from scipy.spatial.distance import cdist

def affinity(user_a, user_b, c=0.1):
    dists = cdist(user_a, user_b)
    return (np.exp(-dists.min(axis=0)/c)).mean()

This has the nice property that if the two sets of points are exactly equal, it returns 1 . 它具有很好的属性,即如果两组点完全相等,则返回1

user_a = np.random.rand(1000, 2)
user_b1 = np.random.rand(1000, 2)
user_b2 = user_a.copy()

print(affinity(user_a, user_b1))
# 0.85169834916
print(affinity(user_b1, user_a))
# 0.856871315902
print(affinity(user_a, user_b2))
# 1.0

It has a small problem, though, as you can see above. 但是,正如您在上面看到的那样,它有一个小问题。 This function is not symmetric. 此功能不对称。 However, we can make it symmetric by considering both equally: 但是,我们可以通过平等地考虑两者来使其对称:

def affinity(user_a, user_b, c=0.1):
    dists = cdist(user_a, user_b)
    min_dists = dists.min(axis=0), dists.min(axis=1)
    return np.concatenate([np.exp(-x/c) for x in min_dists]).mean()

print(affinity(user_a, user_b1, 0.01))
# 0.271448093071
print(affinity(user_b1, user_a, 0.01))
# 0.271448093071
print(affinity(user_a, user_b2, 0.01))
# 1.0

Of course you can use many different metrics to determine the fall-off of larger distances. 当然,您可以使用许多不同的指标来确定更大距离的衰减。 Here I chose exp(-x) , but you could also use 1 - tanh(x) or tanh(1/(x+epsilon)) (the epsilon is needed to avoid a divison by zero in case two points are exactly identical). 在这里,我选择了exp(-x) ,但您也可以使用1 - tanh(x)tanh(1/(x+epsilon)) (需要epsilon以避免被零除的情况,以防两点完全相同) 。 This results in different behaviour: 这导致了不同的行为: 在此处输入图片说明

Actually, you could use 1 - any function defined in this post . 实际上,您可以使用1- 这篇文章中定义的任何函数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM