简体   繁体   English

非数值元组的邻接矩阵

[英]Adjacency Matrix of Non-numeric tuples

I have a massive dictionary of items in a co-occurrence format. 我有一个同时出现格式的大型词典。 Basically, conditional word vectors. 基本上,条件词向量。 the simplified dictionary looks something like this: 简化的字典如下所示:

reservoir ={
 ('a', 'b'): 2,
 ('a', 'c'): 3,
 ('b', 'a'): 1,
 ('b', 'c'): 3,
 ('c', 'a'): 1,
 ('c', 'b'): 2,
 ('c', 'd'): 5,             ,
}

For the sake of storage, I have decided that if there isn't a co-occurrence, then to not store the information at all, ie: the fact that a and b never occur with d, and therefore I do not have any information associated with either point. 为了存储起见,我决定如果没有共现,则根本不存储信息,即:a和b永远不会与d发生,因此我没有任何信息与任一点相关联。

The result I'm trying to get is that for every tuple, key1=x and key2=y, so that in a matrix it will look like this: 我想要得到的结果是,对于每个元组,key1 = x和key2 = y,因此在矩阵中看起来像这样:

  a b c d
a 0 2 3 0
b 1 0 3 0
c 1 2 0 5
d 0 0 0 0

I 一世

I have found information in this post: Adjacency List and Adjacency Matrix in Python , but it's just not quite what I'm looking to do. 我在这篇文章中找到了信息: Python中的邻接列表和邻接矩阵 ,但这并不是我想要的。 All my attempts thus far have proven to be less than fruitful. 到目前为止,我所有的尝试都没有取得成果。 Any help would be amazing. 任何帮助都将是惊人的。

Thanks again, 再次感谢,

You really just need to get the labels for the rows and columns. 您实际上只需要获取行和列的标签。 From there, it's just a few for loops: 从那里开始,只有几个for循环:

from __future__ import print_function

import itertools

reservoir = {
    ('a', 'b'): 2,
    ('a', 'c'): 3,
    ('b', 'a'): 1,
    ('b', 'c'): 3,
    ('c', 'a'): 1,
    ('c', 'b'): 2,
    ('c', 'd'): 5
}

fields = sorted(list(set(itertools.chain.from_iterable(reservoir))))

print(' ', *fields)

for row in fields:
    print(row, end=' ')

    for column in fields:
        print(reservoir.get((row, column), 0), end=' ')

    print()

Your table will start getting ugly when the cells get more than one digit, so I'll leave that to you to figure out. 当单元格的位数超过一位数时,您的表格将开始变得丑陋,因此我将留给您找出答案。 You'll just need to find the maximal length of the field for each column before printing them. 您只需要在打印每列之前找到该字段的最大长度即可。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM