简体   繁体   English

Python 数据集中的分组和标记值

[英]Grouping and labelling values in dataset in Python

I am trying to group my dataset into a unique label.我正在尝试将我的数据集分组为一个独特的 label。 Assume I have this data.假设我有这些数据。 Point and its neighbor point in column ABCD.点及其 ABCD 列中的相邻点。

Dataset数据集

Array:大批:

[[1 2]
 [2 1 4 5 7]
 [3 2]
 [4 2 10]
 [5 2 8]
 [6]
 [7 2 13]
 [8 5]
 [9]
 [10 4 1]
 [11 12]
 [12 11]
 [13 7]]

I am trying to summarize the data, and the desired result is as follow:我正在尝试总结数据,期望的结果如下:

Label 1 = 1 2 4 5 7 3 10 8 13
Label 2 = 6
Label 3 = 9
Label 4 = 11 12 

The point is when a value is already in a list with label, then give the value the existing label.关键是当一个值已经在 label 的列表中时,然后给该值现有的 label。 But when the value is not in a list, then give it new label.但是当值不在列表中时,则给它新的 label。 I little bit confused how to call this problem, so I not found yet any same problem with mine.我有点困惑如何称呼这个问题,所以我还没有发现任何与我相同的问题。 I would be very thankfull if somebody can give the python code or the pseudocode.如果有人可以提供 python 代码或伪代码,我将非常感激。 Thank you谢谢

here is a working code, the result will be formated in a dictionary where keys are your data and values are labels like this: {key: value, data:label}这是一个工作代码,结果将在字典中格式化,其中键是您的数据,值是这样的标签: {key: value, data:label}

label=0
listOfLabels= dict()
for row in array:
    if not (any(x in row for x in listOfLabels.keys())):
        label+=1
    for i in (i for i in row if i not in listOfLabels.keys()):
        listOfLabels[i]=label
print(listOfLabels)

please let me know if it needs some clarifications如果需要澄清,请告诉我

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM