简体   繁体   中英

Test/Train data set for Graph Network

I have a graphical network that I am creating as follows:

g=nx.read_edgelist(data, create_using=nx.Graph())

I am trying to create a test and train set for the data. I tried using the below command:

train, test = train_test_split(g, test_size=0.2) 

but this did not work. Can you please advise how I am suppose to create a test and train set when I have a graphical network.

Depending on your task, you can have a try with Stellargraph's EdgeSplitter class( docs ) and scikit-learn's train_test_split function ( docs ) to do this.

Node classification

If your task is a node classification task, this Node classification with Graph Convolutional Network (GCN) is a good example of how to load data and do train-test-split. It took Cora dataset as an example. The most important steps are the following:

dataset = sg.datasets.Cora()
display(HTML(dataset.description))
G, node_subjects = dataset.load()

train_subjects, test_subjects = model_selection.train_test_split(
    node_subjects, train_size=140, test_size=None, stratify=node_subjects
)
val_subjects, test_subjects = model_selection.train_test_split(
    test_subjects, train_size=500, test_size=None, stratify=test_subjects
)

train_gen = generator.flow(train_subjects.index, train_targets)
val_gen = generator.flow(val_subjects.index, val_targets)
test_gen = generator.flow(test_subjects.index, test_targets)

Basically, it's the same as train-test-split with a normal classification task, except what we split here is nodes.

Edge classification

If your task is edge classification, you could have a look at this Link prediction example: GCN on the Cora citation dataset . The most relevant code for train-test-split is

# Define an edge splitter on the original graph G:
edge_splitter_test = EdgeSplitter(G)

# Randomly sample a fraction p=0.1 of all positive links, and same number of negative links, from G, and obtain the
# reduced graph G_test with the sampled links removed:
G_test, edge_ids_test, edge_labels_test = edge_splitter_test.train_test_split(
    p=0.1, method="global", keep_connected=True
)

# Define an edge splitter on the reduced graph G_test:
edge_splitter_train = EdgeSplitter(G_test)

# Randomly sample a fraction p=0.1 of all positive links, and same number of negative links, from G_test, and obtain the
# reduced graph G_train with the sampled links removed:
G_train, edge_ids_train, edge_labels_train = edge_splitter_train.train_test_split(
    p=0.1, method="global", keep_connected=True
)

# For training we create a generator on the G_train graph, and make an 
# iterator over the training links using the generator’s flow() method:

train_gen = FullBatchLinkGenerator(G_train, method="gcn")
train_flow = train_gen.flow(edge_ids_train, edge_labels_train)
test_gen = FullBatchLinkGenerator(G_test, method="gcn")
test_flow = train_gen.flow(edge_ids_test, edge_labels_test)

Here the splitting algorithm behind EdgeSplitter class( docs ) is more complex, it needs to maintain the graph structure while doing the split, such as keeping the graph connectivity for example. For more details, cf source code for EdgeSplitter

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM