简体   繁体   中英

Any number of link from a single node to nodes in a linked list?

In C I can write singly, doubly or whatever 4, 5 links from a node . But can I make a structure like where, from a node the number of links is not specified, ie it has to add a new link dynamically. if it's not possible in C then where can I? For example: node "a" has links to node "b","c","d","e" and "f" and "b" has links to only "c","d","a" .. like that. So the number of links is not specified. Hope you understand my problem.

Yes, by using a dynamically allocated array for the links. Typically, you'll want to store both the actual number of links ( count ), as well as the number of link pointers dynamically allocated ( maxcount ):

struct node {
    size_t        maxcount;
    size_t        count;
    struct node **child;
    /* Plus node data or other properties */
};

If each link (or edge) has an associated weight, you can use

struct node;

struct edge {
    struct node *target;
    double       weight;
};

struct node {
    size_t       maxcount;
    size_t       count;
    struct edge *edges;

    /* Visited flag, for full graph traversal */
    int          visited;

    /* Node data; here, name, as a C99 flexible array member */
    char         name[];
};

Note that Graphviz is an excellent tool for visualizing such graphs. I regularly write my tests to produce DOT language graph definitions on output, so that I can use eg dot from Graphviz to draw them as nice graphs.

If you have a directed graph using struct edge and struct node above, with all visited fields cleared to zero, you can safely -- that is, without getting stuck at cycles -- create the DOT output for such a graph using a recursive helper function, say

static void write_dot_node(struct node *graph, FILE *out)
{
    size_t  i;

    if (graph->visited)
        return;

    graph->visited = 1;

    fprintf(out, "\tnode%p [ label=\"%s\" ];\n", graph, graph->name);

    for (i = 0; i < graph->count; i++) {
        write_dot_node(graph->edges[i].target, out);
        fprintf(out, "\tnode%p -> node%p [ taillabel=\"%.3f\" ];\n",
                     graph, graph->edges[i].target,
                     graph->edges[i].weight);
    }
}

void write_dot(struct node *graph, FILE *out)
{
    if (!graph || !out)
        return;

    fprintf(out, "digraph {\n");
    write_dot_node(graph, out);
    fprintf(out, "}\n");
}

If you have truly huge graphs, the above may recurse too deep in some cases. It then needs to be converted to non-recursive loop that uses an explicit stack of nodes yet to be visited, with the write_dot function initializing and discarding the stack:

#define  VISITED_STACKED (1<<0)
#define  VISITED_VISITED (1<<1)

int write_dot(struct node *graph, FILE *out)
{
    struct node **unseen = NULL, **temp;
    size_t        unseens = 1;
    size_t        unseens_max = 1024; /* Initial stack size */

    unseen = malloc(unseens_max * sizeof *unseen);
    if (!unseen) {
        errno = ENOMEM;
        return -1;
    }
    unseen[0] = graph;

    fprintf(out, "digraph {\n");

    while (unseens > 0) {
        struct node *curr = unseen[--unseens];
        size_t       i, n;

        /* Already visited (since pushed on stack)? */
        if (curr->visited & VISITED_VISITED)
            continue;
        else
            curr->visited |= VISITED_VISITED;

        fprintf(out, "\tnode%p [ label=\"%s\" ];\n", curr, curr->name);

        for (i = 0, n = 0; i < curr->count; i++) {
            /* Count unvisited child nodes */
            n += !(curr->edges[i].target->visited & VISITED_STACKED);

            fprintf(out, "\tnode%p -> node%p [ taillabel=\"%.3f\" ];\n",
                    curr, curr->edges[i].target, curr->edges[i].weight);
        }

        if (n + unseens > unseens_max) {
            if (n + unseens > 1048576)
                unseens_max = ((n + unseens) | 1048575) + 1048573;
            else
            if (n + unseens < 2 * unseens_max)
                unseens_max = 2 * unseens_max;
            else
                unseens_max = 2 * (n + unseens);

            temp = realloc(unseen, unseens_max * sizeof *unseen);
            if (!temp) {
                free(unseen);
                errno = ENOMEM;
                return -1;
            } else
                unseen = temp;
        }

        /* Add unvisited child nodes to stack. */
        for (i = 0; i < curr->count; i++)
            if (!(curr->edges[i].target->visited & VISITED_STACKED)) {
                curr->edges[i].target->visited |= VISITED_STACKED;
                unseen[unseens++] = curr->edges[i].target;
            }
    }

    free(unseen);

    fprintf(out, "}\n");

    return 0;
}

In this case, the VISITED_STACKED bit mask indicates the node has already been added to the stack for later processing, and VISITED_VISITED bit mask indicates the node has been processed.


As Ovanes pointed out in a comment to this answer, for a very dense graph, you could use a map or hashtable, especially if you often need to find out whether some pair of nodes share an edge or not. In this case, you can augment the above structure with an optional hash table of the target pointers.

Just for fun, I tested this in practice for directed weighted graphs with per-graph user-specified reallocation size functions and hashing functions, using the interface digraph.h :

#ifndef   DIGRAPH_H
#define   DIGRAPH_H
#include <stdlib.h>
#include <stdio.h>

struct digraph_node;

struct digraph_edge {
    struct digraph_edge  *hash_next;        /* Hash table slot chain */
    size_t                hash_code;        /* Hash value */
    struct digraph_node  *target;           /* Target edge of the node */
    double                weight;           /* Weight of this edge */
};

struct digraph_node {
    struct digraph_node  *hash_next;        /* Hash table slot chain */
    size_t                hash_code;        /* Hash value */
    struct digraph_edge  *edge;             /* Array of edges */
    size_t                edges;            /* Number of edges in this node */
    size_t                edges_max;        /* Number of edges allocated for */
    struct digraph_edge **hash_slot;        /* Optional edge hash table */
    size_t                hash_size;        /* Size of optional edge hash table */
    char                  name[];           /* Name of this node */
};

typedef struct {
    struct digraph_node **node;             /* Array of pointers to graph nodes */
    size_t                nodes;            /* Number of nodes in this graph */
    size_t                nodes_max;        /* Number of nodes allocated for */
    struct digraph_node **hash_slot;        /* Optional node hash table */
    size_t                hash_size;        /* Size of optional node hash table */
    /* Graph resize policy and hash function */
    size_t (*graph_nodes_max)(size_t nodes);
    size_t (*graph_hash_size)(size_t nodes);
    size_t (*graph_hash_func)(const char *name);
    /* Node resize policy and hash function */
    size_t (*node_edges_max)(size_t edges);
    size_t (*node_hash_size)(size_t edges);
    size_t (*node_hash_func)(struct digraph_node *target);
} digraph;

void digraph_init(digraph *graph);
void digraph_free(digraph *graph);

struct digraph_node *digraph_find_node(digraph *graph, const char *name);
struct digraph_edge *digraph_find_edge(digraph *graph, struct digraph_node *source, struct digraph_node *target);

struct digraph_node *digraph_add_node(digraph *graph, const char *name);
struct digraph_edge *digraph_add_edge(digraph *graph, struct digraph_node *source, struct digraph_node *target, double weight);

int digraph_dot(digraph *graph, FILE *out);

size_t digraph_default_graph_nodes_max(size_t nodes);
size_t digraph_default_graph_hash_size(size_t nodes);
size_t digraph_default_graph_hash_func(const char *name);
size_t digraph_default_node_edges_max(size_t edges);
size_t digraph_default_node_hash_size(size_t edges);
size_t digraph_default_node_hash_func(struct digraph_node *target);

#define  DIGRAPH_INIT  { NULL, 0, 0,                        \
                         NULL, 0,                           \
                         digraph_default_graph_nodes_max,   \
                         digraph_default_graph_hash_size,   \
                         digraph_default_graph_hash_func,   \
                         digraph_default_node_edges_max,    \
                         digraph_default_node_hash_size,    \
                         digraph_default_node_hash_func     \
                       }

#endif /* DIGRAPH_H */

For simplicity, the digraph_add_node() and digraph_add_edge() functions always set errno ; to 0 if success, to EEXIST if such a node or edge already exists, or to an error code otherwise. If the node or edge already exists, the functions do return the existing one (but with errno set to EEXIST instead of 0 ). This makes adding new edges very easy.

On a laptop running 64-bit linux on an Intel Core i5-6200U processor, it took about 18 seconds to generate 5,000 random nodes, 12,500,000 random edges (2,500 nodes per edge), and describe the entire graph using GraphViz dot language. This suffices speed-wise for me, as I don't have any tools to visualize such graphs; even Graphviz completely chokes on these.

Note that because the graph structure contains an array of pointers to each node in the graph, no recursion is needed using the above structures:

int digraph_dot(digraph *graph, FILE *out)
{
    size_t  i, k;

    if (!graph)
        return errno = EINVAL;

    if (!out || ferror(out))
        return errno = EIO;

    fprintf(out, "digraph {\n");

    /* Describe all nodes. */
    for (i = 0; i < graph->nodes; i++)
        if (graph->node[i])
            fprintf(out, "\tnode%p [label=\"%s\"];\n",
                         (void *)graph->node[i], graph->node[i]->name);

    /* Describe all edges from all nodes. */
    for (i = 0; i < graph->nodes; i++)
        if (graph->node[i]) {
            if (graph->node[i]->edges) {
                for (k = 0; k < graph->node[i]->edges; k++)
                    fprintf(out, "\tnode%p -> node%p [taillabel=\"%.3f\"];\n",
                                 (void *)(graph->node[i]),
                                 (void *)(graph->node[i]->edge[k].target),
                                 graph->node[i]->edge[k].weight);
            } else {
                fprintf(out, "\tnode%p;\n", (void *)(graph->node[i]));
            }
        }

    fprintf(out, "}\n");

    if (fflush(out))
        return errno;
    if (ferror(out))
        return errno = EIO;

    return errno = 0;
}

Of course, if you have that dense graphs, with each node on average having an edge to half the other nodes in the graph, a weight matrix (with zero weight reserved for no edge, or a separate boolean matrix describing which edges do exist) would make a lot more sense, and would have much better cache locality, too.

When the matrix is stored in row-major order (for example, if you define a two-dimensional array in C), it makes sense to have each row correspond to one source node, and each column correspond to one target node, so that the vector describing the edges and/or weights from one node is consecutive in memory. It is much more common to examine the edges directed outwards from a specific source node, than examining the edges directed to a specific target node; thus, the more common case having better cache locality should help with overall performance.

If we are talking about graphs you should decide which graph you are trying to implement:

I assume from your question, that you need to implement the directed graph . Where a vertex (= node ) A can have a directed edge (= link ) to vertex B , but not necessarily vice versa. In general you are looking to implement a Multigraph . Here you have two alternatives to represent such a graph:

  • Matrix, ie 2D array
  • Hashtable, ie Key represents the source vertex which is mapped to a list or set of destination vertices. Every such mapping (ie a pair of vertices) represents a directed edge.

Let's just have an example on how each of the approaches work. I will give you the major direction, so that you can train and implement it yourself. Otherwise, I might end up doing your homework :)

Given the following Graph G:

A -> B -> C    
^--D-^

E

Note here! E has no incoming or outgoing edges. This is still a valid graph.

Using a Matrix we can represent it as :

    A B C D E
   +-+-+-+-+-+
 A |0|1|0|0|0|
   +-+-+-+-+-+
 B |0|0|1|0|0|
   +-+-+-+-+-+
 C |0|0|0|0|0|
   +-+-+-+-+-+
 D |1|1|0|0|0|
   +-+-+-+-+-+
 E |0|0|0|0|0|
   +-+-+-+-+-+

Everywhere we have a number 0 we don't have an edge from vertex X to vertex Y . The number greater than one means in your case the number of edges from source vertex to the destination vertex.

An example: Does vertex C point to the vertex B (eg has a directed edge)?

Answer: Find the row C move to the column B if there is a number >0 than, yes. Otherwise, no. In our case the number is 0 => no connection.

If you need to represent a weighted connection you will probably end up with an array/list in each cell of the matrix. If array is empty ( hint: one of the ways to implement 0 sized array ) => there is no connection between to vertices, otherwise each element of the array represents the weight of the connection. Number of elements represent number of connections.

Also, be aware that in graph a vertex might have an edge to itself. Don't know whether this applies in your case.

Keep in mind, that Matrix is a very memory expensive representation. It allows you a very fast lookup, but you always require O(|V|^2) ( |V| stands for the total number of vertices in a graph) elements to represent a graph with the matrix.

2D arrays are a well supported feature in C. Here a small example, on how to statically initialise a 5x5 matrix from the above case in C. For dynamic initialisation dig into C language a little bit (take a look at malloc , free ):

#define ROWS 5
#define COLUMNS 5


int matrix[ROWS][COLUMNS] =
//           A  B  C  D  E
/* A */  { { 0, 1, 0, 0, 0 }
/* B */  , { 0, 0, 1, 0, 0 }
/* C */  , { 0, 0, 0, 0, 0 }
/* D */  , { 1, 1, 0, 0, 0 }
/* E */  , { 0, 0, 0, 0, 0 }
         }
;

Using the second approach (hashtable or dictionary) is a bit different .

You need a hashtable. AFAIK, C and C99 do not have support for hashtables. Thus you will need either to use one of existing open source implementations or implement a hashtable yourself. Your question does not question how to do the implementation in C++ where either a map or unordered_map (>= C++11) are part of the standard library. But still you ask what can be used instead of C . Thus the closes language at that point would be C++ .

Here is an idea on how to do that: Let every vertex be a key in the Hashtable and it points to either a set of vertices where the connection exists or to the second Hashtable which builds up the weight of particular mapping.

   A -> (B,)
   B -> (C,)
   C -> ()
   D -> (A, B)
   E -> ()

in case with multiple connections you need to introduce something like:

   A -> {B -> 1,}
   B -> {C -> 1,}
   C -> {}
   D -> {A -> 1, B -> 1}
   E -> {}

Let's just state that A has 2 edges to B, than the structure is instantiated like:

   A -> {B -> 2,}
   B -> {C -> 1,}
   C -> {}
   D -> {A -> 1, B -> 1}
   E -> {}

Here if the graph is big and sparse you can save really a lot of memory.

The easiest implementation IMO would be in Python, as you ask what can be used instead of C . But you can really use C , C++ , Java , Python , Ruby or any other language, to implement that kind of problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM