有没有办法从无监督数据集中获取关系？

Question

I have some data, the dataset includes features such as device id (int), phone (int), name (string), device id etc. But all data do not have the label.我有一些数据，数据集包括设备ID （int）、电话（int）、名称（字符串）、设备ID等特征。但所有数据都没有标签。 my task is to get the probability of a person using multiple ids or multiple devices.我的任务是获取一个人使用多个 ID 或多个设备的概率。 I have no idea how to do it, Does anyone have an idea?我不知道该怎么做，有人有想法吗？

for clear, here is an example.为了清楚起见，这里有一个例子。 the dataset is like数据集就像

  name   id    phone  device_id  
 Jason   123    12345   12341231     ......  
 James   1345   312312  312312312    ......  
 Jason   123    53523   23115124    ......

so we can find that Jason has 2 phone numbers,所以我们可以发现 Jason 有 2 个电话号码，
how do I get the probability by using the machine-learning method or deep learning method?如何使用机器学习方法或深度学习方法获得概率？

Answer 1

One of the possible way to do this is to compute similarity of user.一种可能的方法是计算用户的相似度。

As I can understand the device similarity for a user is your end goal.据我所知，用户的设备相似性是您的最终目标。

For starters combine name and Id field which uniquely identifies an user.首先，结合唯一标识用户的名称和 Id 字段。 Generate, a feature vector for all remaining as an array.生成，所有剩余的特征向量作为数组。

Afterwards you can just run nested for loop with all user over other.之后，您可以与所有用户一起运行嵌套的 for 循环。 This will give you closest match and you can set a threshold or you can pick kNN to do that.这会给你最接近的匹配，你可以设置一个阈值，或者你可以选择 kNN 来做到这一点。

take a look at this: Convert Nested dictionary to Pyspark Dataframe看看这个： Convert Nested dictionary to Pyspark Dataframe

有没有办法从无监督数据集中获取关系？

问题描述

1 个解决方案

解决方案1
0 2020-11-02 22:43:59

有没有办法从无监督数据集中获取关系？

问题描述

1 个解决方案

解决方案1 0 2020-11-02 22:43:59

解决方案1
0 2020-11-02 22:43:59