简体   繁体   English

有没有办法从无监督数据集中获取关系?

[英]Is there any way to get the relationship from the unsupervised dataset?

I have some data, the dataset includes features such as device id (int), phone (int), name (string), device id etc. But all data do not have the label.我有一些数据,数据集包括设备ID (int)、电话(int)、名称(字符串)、设备ID等特征。但所有数据都没有标签。 my task is to get the probability of a person using multiple ids or multiple devices.我的任务是获取一个人使用多个 ID 或多个设备的概率。 I have no idea how to do it, Does anyone have an idea?我不知道该怎么做,有人有想法吗?

for clear, here is an example.为了清楚起见,这里有一个例子。 the dataset is like数据集就像

  name   id    phone  device_id  
 Jason   123    12345   12341231     ......  
 James   1345   312312  312312312    ......  
 Jason   123    53523   23115124    ......

so we can find that Jason has 2 phone numbers,所以我们可以发现 Jason 有 2 个电话号码,
how do I get the probability by using the machine-learning method or deep learning method?如何使用机器学习方法或深度学习方法获得概率?

One of the possible way to do this is to compute similarity of user.一种可能的方法是计算用户的相似度。

As I can understand the device similarity for a user is your end goal.据我所知,用户的设备相似性是您的最终目标。

For starters combine name and Id field which uniquely identifies an user.首先,结合唯一标识用户的名称和 Id 字段。 Generate, a feature vector for all remaining as an array.生成,所有剩余的特征向量作为数组。

Afterwards you can just run nested for loop with all user over other.之后,您可以与所有用户一起运行嵌套的 for 循环。 This will give you closest match and you can set a threshold or you can pick kNN to do that.这会给你最接近的匹配,你可以设置一个阈值,或者你可以选择 kNN 来做到这一点。

take a look at this: Convert Nested dictionary to Pyspark Dataframe看看这个: Convert Nested dictionary to Pyspark Dataframe

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 有没有办法找到Dataset的来源? - Is there any way to find the source of Dataset? 有没有办法手动修改从给定数据集学习到的决策树中设置的阈值? - Is there any way to manually modify the thresholds set in the decision tree learnt from a given dataset? 有什么方法可以读取 .pth(dataset) 并将它们转换为 csv 吗? - Is there any way to read .pth(dataset) and turn them into csv? 无法从“fastText”(未知位置)导入名称“train_unsupervised” - cannot import name 'train_unsupervised' from 'fastText' (unknown location) 如何从PyMC3中的Dirichlet过程中提取无监督的聚类? - How to extract unsupervised clusters from a Dirichlet Process in PyMC3? 从测试数据集中获得最佳准确性 - To get best accuracy from Testing dataset 无监督学习 - Unsupervised Learning 有没有办法实现机器学习 model 可以预测给定数据集中出现次数最多的句子 - is there any way to implement a machine learning model that can predict most occured sentence in a given dataset 有没有一种方法可以使用scikit的无监督方法来学习将某些列表分类为不同的组? - Is there a way using unsupervised method of scikit learn to classify some list into different groups? 是否有可能从数据集中获得两种不同类型的结果 - Is it possible to get two different types of results from dataset
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM