简体   繁体   English

在 StellarGraph 中使用 Hinsage/Graphsage 的链接预测返回 NaN

[英]Linkprediction using Hinsage/Graphsage in StellarGraph returns NaNs

I am trying to run a link prediction using HinSAGE in the stellargraph python package.我正在尝试使用 stellargraph python 包中的 HinSAGE 运行链接预测。

I have a network of people and products, with edges from person to person (KNOWs) and person to products (BOUGHT).我有一个人和产品的网络,有人对人(知道)和人对产品(买)的边缘。 Both people and products got a property vector attached, albeit a different one from each type (Persons vector is 1024 products is 200).人和产品都附加了一个属性向量,尽管每种类型的属性向量不同(人向量是 1024 产品是 200)。 I am trying to create a link prediction algorithm from person to product based on all the information in the network.我正在尝试根据网络中的所有信息创建从人到产品的链接预测算法。 The reason for me for using HinSAGE is the option for inductive learning.我使用 HinSAGE 的原因是归纳学习的选择。

I have the code below, and I thought I was doing it similar to the examples我有下面的代码,我以为我在做类似于示例

https://stellargraph.readthedocs.io/en/stable/demos/link-prediction/hinsage-link-prediction.html https://stellargraph.readthedocs.io/en/stable/demos/link-prediction/graphsage-link-prediction.html https://stellargraph.readthedocs.io/en/stable/demos/link-prediction/hinsage-link-prediction.html https://stellargraph.readthedocs.io/en/stable/demos/link-prediction/graphsage-link -预测.html

but I keep getting "nan" as my output predictions, anyone got a suggestion to what I can try?但是我的输出预测一直是“nan”,有人对我可以尝试的方法提出建议吗?

import networkx as nx
import pandas as pd
import numpy as np
from tensorflow.keras import Model, optimizers, losses, metrics
import stellargraph as sg
from stellargraph.data import EdgeSplitter
from stellargraph.mapper import HinSAGELinkGenerator
from stellargraph.layer import HinSAGE, link_classification, link_regression
from sklearn.model_selection import train_test_split


graph.info()
#StellarGraph: Undirected multigraph
# Nodes: 54226, Edges: 259120
#
# Node types:
#  products: [45027]
#    Features: float32 vector, length 200
#    Edge types: products-BOUGHT->person
#  person: [9199]
#    Features: float32 vector, length 1024
#    Edge types: person-KNOWS->person, person-BOUGHT->product
#
# Edge types:
#    person-KNOWS->person: [246131]
#        Weights: all 1 (default)
#        Features: none
#    person-BOUGHT->product: [12989]
#        Weights: all 1 (default)
#        Features: none



import networkx as nx
import pandas as pd
import numpy as np
import os
import random
from tensorflow.keras import Model, optimizers, losses, metrics
import stellargraph as sg
from stellargraph.data import EdgeSplitter
from stellargraph.mapper import HinSAGELinkGenerator
from stellargraph.layer import HinSAGE, link_classification
from stellargraph.data import UniformRandomWalk
from stellargraph.data import UnsupervisedSampler
from sklearn.model_selection import train_test_split

from stellargraph.layer import HinSAGE, link_regression



edge_splitter_test = EdgeSplitter(graph)
graph_test, edges_test, labels_test = edge_splitter_test.train_test_split(
    p=0.1, method="global", edge_label="BOUGHT"
)
edge_splitter_train = EdgeSplitter(graph_test, graph)

graph_train, edges_train, labels_train = edge_splitter_train.train_test_split(
    p=0.1, method="global", edge_label="BOUGHT"
)


num_samples = [8, 4]

G = graph

batch_size = 20
epochs = 20


generator = HinSAGELinkGenerator(
    G, batch_size, num_samples, head_node_types=["person", "product"]
)
train_gen = generator.flow(edges_train, labels_train, shuffle=True)
test_gen = generator.flow(edges_test, labels_test)


hinsage_layer_sizes = [32, 32]
assert len(hinsage_layer_sizes) == len(num_samples)

hinsage = HinSAGE(
    layer_sizes=hinsage_layer_sizes, generator=generator, bias=True, dropout=0.0
)


# Expose input and output sockets of hinsage:
x_inp, x_out = hinsage.in_out_tensors()



    
# Final estimator layer
prediction = link_classification(
    output_dim=1, output_act="sigmoid", edge_embedding_method="concat"
)(x_out)

model = Model(inputs=x_inp, outputs=prediction)

model.compile(
    optimizer=optimizers.Adam(),
    loss=losses.binary_crossentropy,
    metrics=["acc"],
)

history = model.fit(train_gen, epochs=epochs, validation_data=test_gen, verbose=2)

So I found the problem, might be useful for others.所以我发现了这个问题,可能对其他人有用。 If there is any node containing missing data, the thing will just produce NAs.如果有任何包含缺失数据的节点,则该事物只会产生 NA。 Especially dangerous if you create your graph by joining pandas dataframes, I had a typo in one file that was integrated and led to the problem.如果您通过加入 pandas 数据框来创建图形,则尤其危险,我在一个已集成的文件中出现了拼写错误并导致了问题。

I'm now working on a similar issue with HinSAGE.我现在正在处理 HinSAGE 的类似问题。 I'm interested about your method: do you need to assign the links between people and products that don't have a connection to 0 if you want to predict the link existing between people and products?我对你的方法很感兴趣:如果你想预测人和产品之间存在的联系,你是否需要将没有联系的人和产品之间的联系分配为 0?

Thank you so much!非常感谢!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM