简体   繁体   English

从csv数据集创建python中的邻接矩阵

[英]Create adjacency matrix in python from csv dataset

I have data that comes in the format as follows: 我的数据格式如下:

eventid    mnbr
20         1
26         1
12         2
14         2
15         3
14         3
10         3

eventid is an event that the member attended the data is represented as a panel so as you can see each member attends multiple events and multiple members can attend the same event. eventid是一个成员参加数据的事件被表示为一个小组,因此您可以看到每个成员参加多个活动,多个成员可以参加同一个活动。 My goal is to create an adjacency matrix that shows: 我的目标是创建一个邻接矩阵,显示:

 mnbr  1    2    3
 1     1    0    0
 2     0    1    1
 3     0    1    1

where there is a 1 whenever two members attend the same event. 只要两名成员参加同一活动,就会有1。 I was successfully able to read the columns of the csv file into 2 separate 1D numpy arrays. 我成功地将csv文件的列读入2个独立的1D numpy数组。 However here going forward I am unsure how to proceed. 然而,在这里,我不确定如何继续。 How best do I create a matrix using column 2 and how do I subsequently use column 1 to fill in the values? 如何使用第2列创建矩阵,以及如何使用第1列填充值? I understand I haven't posted any code and don't expect any solutions in that regards, but would greatly appreciate an idea of how to approach the problem in an efficient manner. 我知道我没有发布任何代码,并且不期望在这方面有任何解决方案,但会非常感谢如何以有效的方式解决问题。 I have roughly 3 million observations so creating too many external variables would be problematic. 我有大约300万个观测值,因此创建太多外部变量会有问题。 Thanks in advance. 提前致谢。 I received a notification that my question is a potential duplicate, however my problem was with parsing the data rather than creating the adjacency matrix. 我收到一条通知,说我的问题可能是重复的,但我的问题是解析数据而不是创建邻接矩阵。

Here is a solution. 这是一个解决方案。 It do not give you directly the requested adjacency matrix, but give you what you need to create it yourself. 它不直接为您提供所请求的邻接矩阵,而是为您提供自己创建它所需的内容。

#assume you stored every line of your input as a tuples (eventid, mnbr).
observations = [(20, 1), (26, 1), (12, 2), (14, 2), (15,3 ), (14, 3), (10, 3)]

#then creates an event link dictionary. i.e something that link every event to all its mnbrs
eventLinks = {}

for (eventid, mnbr) in observations :
    #If this event have never been encoutered then create a new entry in links
    if not eventid in eventLinks.keys():
        eventLinks[eventid] = []

    eventLinks[eventid].append(mnbr)

#collect the mnbrs
mnbrs = set([mnbr for (eventid, mnbr) in observations])

#create a member link dictionary. This one link a mnbr to other mnbr linked to it.
mnbrLinks = { mnbr : set() for mnbr in mnbrs }

for mnbrList in eventLinks.values() :
    #add for each mnbr all the mnbr implied in the same event.
    for mnbr in mnbrList:
        mnbrLinks[mnbr] = mnbrLinks[mnbr].union(set(mnbrList))

print(mnbrLinks)

Executing this code give the following result : 执行此代码会产生以下结果:

{1: {1}, 2: {2, 3}, 3: {2, 3}}

This is a dictionary where each mnbr have an associated set of adjacency mnbrs . 这是一个字典,其中每个mnbr都有一组相关的邻接mnbrs This is in fact an adjacency list, that is a compressed adjacency matrix. 这实际上是一个邻接列表,它是一个压缩的邻接矩阵。 You can expand it and build the matrix you were requesting using dictionary keys and values as row and column indexes. 您可以使用字典键和值作为行和列索引来扩展它并构建您请求的矩阵。

Hope it help. 希望它有所帮助。 Arthur. 亚瑟。

EDIT : I provided an approach using adjacency list to let you implement your own adjacency matrix building. 编辑:我提供了一种使用邻接列表的方法,让您实现自己的邻接矩阵构建。 But you should consider to really use this data structure in case your data are sparse. 但是,如果数据稀疏,您应该考虑真正使用此数据结构。 See http://en.wikipedia.org/wiki/Adjacency_list http://en.wikipedia.org/wiki/Adjacency_list

EDIT 2 : Add a code to convert adjacencyList to a little smart adjacencyMatrix 编辑2:添加代码以将adjacencyList转换为一个小的智能adjacencyMatrix

adjacencyList = {1: {1}, 2: {2, 3}, 3: {2, 3}}

class AdjacencyMatrix():

    def __init__(self, adjacencyList, label = ""):
        """ 
        Instanciation method of the class.
        Create an adjacency matrix from an adjacencyList.
        It is supposed that graph vertices are labeled with numbers from 1 to n.
        """

        self.matrix = []
        self.label = label

        #create an empty matrix
        for i in range(len(adjacencyList.keys())):
            self.matrix.append( [0]*(len(adjacencyList.keys())) )

        for key in adjacencyList.keys():
            for value in adjacencyList[key]:
                self[key-1][value-1] = 1

    def __str__(self):
        # return self.__repr__() is another possibility that just print the list of list
        # see python doc about difference between __str__ and __repr__

        #label first line
        string = self.label + "\t"
        for i in range(len(self.matrix)):
            string += str(i+1) + "\t"
        string += "\n"

        #for each matrix line :
        for row in range(len(self.matrix)):
            string += str(row+1) + "\t"
            for column in range(len(self.matrix)):
                string += str(self[row][column]) + "\t"
            string += "\n"


        return string

    def __repr__(self):
        return str(self.matrix)

    def __getitem__(self, index):
        """ Allow to access matrix element using matrix[index][index] syntax """
        return self.matrix.__getitem__(index)

    def __setitem__(self, index, item):
        """ Allow to set matrix element using matrix[index][index] = value syntax """
        return self.matrix.__setitem__(index, item)

    def areAdjacent(self, i, j):
        return self[i-1][j-1] == 1

m = AdjacencyMatrix(adjacencyList, label="mbr")
print(m)
print("m.areAdjacent(1,2) :",m.areAdjacent(1,2))
print("m.areAdjacent(2,3) :",m.areAdjacent(2,3))

This code give the following result : 此代码提供以下结果:

mbr 1   2   3   
1   1   0   0   
2   0   1   1   
3   0   1   1   

m.areAdjacent(1,2) : False
m.areAdjacent(2,3) : True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM