简体   繁体   中英

The most efficient implementation of adjacency list?

I want to create an adjacency list in Java and since I will get a huge set of nodes later as input, it needs to be really efficient.

What sort of implementation is best for this scenario?

A list of lists or maybe a map? I also need to save the edge weights somewhere. I could not figure out how to do this, since the adjacency list itself apparently just keeps track of the connected nodes, but not the edge weight.

Warning: this route is the most masochistic and hardest to maintain possible, and only recommended when the highest possible performance is required.

Adjacency lists are one of the most awkward classes of data structures to optimize, mainly because they vary in size from one vertex to the next. At some broad conceptual level, if you include the adjacency data as part of the definition of a Vertex or Node , then that makes the size of a Vertex/Node variable . Variable-sized data and the kind of memory contiguity needed to be cache-friendly tend to fight one another in most programming languages.

Most object-oriented languages weren't designed to deal with objects that can actually vary in size. They solve that by making them point to/reference memory elsewhere, but that leads to much higher cache misses.

If you want cutting-edge efficiency and you traverse adjacent vertices/nodes a lot, then you want a vertex and its variable number of references/indices to adjacent neighbors (and their weights in your case) to fit in a single cache line, and possibly with a good likelihood that some of those neighboring vertices also fit in the same cache line (though solving this and reorganizing the data to map a 2D graph to a 1-dimensional memory space is an NP-hard problem, but existing heuristics help a lot).

So it ceases to become a question of what data structures to use so much as what memory layouts to use. Arrays are your friend here, but not arrays of nodes . You want an array of bytes packing node data contiguously. Something like this:

[node1_data num_adj adj1 adj2 adj3 (possibly some padding for alignment and to avoid straddling) node2_data num_adj adj1 adj2 adj3 ...]

Node insertion and removal here starts to resemble the kind of algorithms you find to implement memory allocators. When you connect a new edge, that actually changes the node's size and potentially its position in these giant, contiguous memory blocks. Unlike memory allocators, you're potentially allowed to reshuffle and compact and defrag the data provided that you can update your references/indices to it.

Now this is only if you want the fastest possible solution, and provided your use cases are heavily weighted towards read operations (evaluation, traversal) rather than writes (connecting edges, inserting nodes, removing nodes). It's completely overkill otherwise, and a complete PITA since you'll lose all that nice object-oriented structure that helps keep the code easy to maintain, reuse, etc. This has you obliterating all that structure in favor of dealing with things at the bits and bytes level, and it's only worth doing if your software is in a realm where its quality is somehow very proportional to the efficiency of that graph.

One solution you can think of create a class Node which contains the data and a wt. this weight will be the weight of edge through which it is connected to the Node.

suppose you have a list for Node I which is connected to node ABC with edge weight ab c. And Node J is connected to ABC with xyz weights, so the adj List of I will contains the Node object as

 I -> <A, a>,<B b>,<C c>

List of J will contains the Node object as

 J -> <A, x>,<B y>,<C z>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM