简体   繁体   English

实现可导航图的优雅方式?

[英]Elegant way to implement a navigable graph?

This is a design problem. 这是一个设计问题。 I'm struggling to create a conceptual model for a problem I'm facing. 我正在努力为我面临的问题创建一个概念模型。

I have a graph of a number of objects (<1000). 我有一些对象的图表(<1000)。 These objects are connected together in a myriad of ways. 这些对象以多种方式连接在一起。 Each of these objects have some attributes. 每个对象都有一些属性。

I need to be able to access these object via both their connections and their attributes. 我需要能够通过它们的连接和属性访问这些对象。

For example let us assume following objects - 例如,让我们假设以下对象 -

{name: A, attributes:{black, thin, invalid}, connections: {B,C}} 
{name: B, attributes:{white, thin, valid}, connections: {A}} 
{name: C, attributes:{black, thick, invalid}, connections: {A,B}}

Now I should be able to query this graph in following ways - Using attributes - 现在我应该能够通过以下方式查询此图表 - 使用属性 -

black - yields [A,C]
black.thick - yields C

Using connections - 使用连接 -

A.connections[0].connections[0] - yields A

Using combination thereof - 使用它们的组合 -

black[0].connections[0] - yields B

My primary language is Java. 我的主要语言是Java。 But I don't think Java is capable of handling these kinds of beasts. 但我不认为Java能够处理这些种类的野兽。 Thus I'm trying to implement this in a dynamic language like Python. 因此,我试图用像Python这样的动态语言来实现它。 I have also thought about using expression language evaluation like OGNL, or a Graph database. 我还考虑过使用像OGNL或Graph数据库这样的表达式语言评估。 But I'm confused. 但我很困惑。 I'm not interested in coding solutions. 我对编码解决方案不感兴趣。 But what is the correct way to model such a problem? 但建模这样一个问题的正确方法是什么?

It sounds like you have some object model which you want to query in different ways. 听起来你有一些想要以不同方式查询的对象模型。 One solution would be to use Java to create your model and then use a scripting language to support querying against this model in different ways. 一种解决方案是使用Java来创建模型,然后使用脚本语言来支持以不同方式查询此模型。 eg: Java + Groovy would be my recommendation. 例如:Java + Groovy将是我的推荐。

You could use the following Java class for the model. 您可以将以下Java类用于该模型。

public class Node {

    private String name;
    private final Set<String> attributes = new HashSet<String>();
    private final List<Node> connections = new ArrayList<Node>();

    // getter / setter for all
}

You should then populate a list of such objects with 'connections' property properly populated. 然后,您应该使用正确填充的“connections”属性填充此类对象的列表。

To support different kinds of scripting what you need to do is create a context for the scripts and then populated this context. 要支持不同类型的脚本,您需要为脚本创建上下文,然后填充此上下文。 Context is basically a map. 上下文基本上是一个地图。 The keys of the map become variables available to the script. 地图的键成为脚本可用的变量。 The trick is to populate this context to support your querying requirements. 诀窍是填充此上下文以支持您的查询要求。

For example in groovy the binding is the context (refer http://groovy.codehaus.org/Embedding+Groovy ). 例如,在groovy中,绑定是上下文(请参阅http://groovy.codehaus.org/Embedding+Groovy )。 So if you populate it the following way your querying needs will be taken care of 因此,如果您按照以下方式填充它,您的查询需求将得到处理

Context/Binding Map 上下文/绑定映射

1. <Node name(String), Node object instance(Node)>
2. <Attribute name(String), list of nodes having this attribute(List<Node>)>

when you evaluate a script saying 'A.connections[0]', in the binding the object stored against key 'A' would be looked up. 当您评估脚本说'A.connections [0]'时,在绑定中,将查找存储在键'A'上的对象。 Then the returned objects 'connections' property will be accessed. 然后将访问返回的对象'connections'属性。 Since that is a list the '[0]' syntax on that is permitted in groovy. 由于这是一个列表,因此在groovy中允许使用'[0]'语法。 This will return the object at index 0. Likewise to support your querying requirements you need to populate the context. 这将返回索引0处的对象。同样,为了支持您的查询要求,您需要填充上下文。

It depends where you want your performance to be. 这取决于您希望表现的位置。

If you want fast queries , and don't mind a bit of extra time/memory when adding an object, keeping an array/list of pointers to objects with specific attributes might be a good idea (particularly if you know during design-time what the possible attributes could be). 如果你想要快速查询 ,并且在添加对象时不介意一些额外的时间/内存,保持指向具有特定属性的对象的数组/指针列表可能是一个好主意(特别是如果你在设计时知道什么可能的属性可能是)。 Then, when adding a new object, say: 然后,在添加新对象时,请说:

{name: A, attributes:{black, thin, invalid}, connections: {B,C}} 

add a new pointer to the black list, the thin list, and the invalid list. 添加一个指向blackthin列表和invalid列表的新指针。 Quick queries on connections will probably require keeping a list/array of pointers as a member of the object class. 对连接的快速查询可能需要将指针的列表/数组保持为object类的成员。 Then when you create an object, add pointers for the correct objects. 然后,在创建对象时,为正确的对象添加指针。

If you don't mind slower queries and want to optimize performance when adding objects, a linked list might be a better approach. 如果您不介意较慢的查询并希望在添加对象时优化性能,则链接列表可能是更好的方法。 You can just loop through all of the objects, checking at each one if it satisfies the condition of the query. 您可以遍历所有对象,检查每个对象是否满足查询条件。

In this case, it would still be a good idea to keep member pointers for the connections, if (as your question would seem to indicate) you're looking to do multiple-level queries (ie A.connections[0].connections[0] . This will result in extremely poor performance if done via nested loops.) 在这种情况下,保持连接的成员指针仍然是一个好主意,如果(正如您的问题似乎表明)您正在寻求进行多级查询(即A.connections[0].connections[0] 。如果通过嵌套循环完成,这将导致极差的性能。)

Hopefully that helps, it really kind of depends on what kind of queries you're expecting to call most frequently. 希望这有所帮助,它实际上取决于您期望最常呼叫的查询类型。

There is no problem expressing this in Java. 在Java中表达这一点没有问题。 Just define classes representing nodes sets of nodes. 只需定义表示节点节点集的类。 Assuming that there is a fixed set of attributes, it could look like: 假设有一组固定的属性,它可能看起来像:

enum Attribute {
    BLACK, WHITE, THIN, VALID /* etc. */ ;
}
class Node {
    public final String name;
    public final EnumSet<Attribute> attrs
        = EnumSet.noneOf(Attribute.class);
    public final NodeSet connections
        = new NodeSet();

    public Node(String name)
    {
        this.name = name;
    }

    // ... methods for adding attributes and connections
}

and then a class that represents a set of nodes: 然后是一个代表一组节点的类:

class NodeSet extends LinkedHashSet<Node> {
    /**
     * Filters out nodes with at least one of the attributes.
     */
    public NodeSet with(Attribute... as) {
        NodeSet out = new NodeSet();
        for(Node n : this) {
            for(a : as)
                if (n.attrs.contains(a)) {
                    out.add(n);
                    break;
                }
        }
        return out;
    }

    /**
     * Returns all nodes connected to this set.
     */
    public NodeSet connections() {
        NodeSet out = new NodeSet();
        for(Node n : this)
            out.addAll(n.connections);
        return out;
    }

    /**
     * Returns the first node in the set.
     */
    public Node first() {
        return iterator().next();
    }
}

(I haven't checked that the code compiles, it's just a sketch.) Then, assuming you have a NodeSet all of all the nodes, you can do things like (我没有检查代码编译,它只是一个草图。)然后,假设你有一个NodeSet all所有的节点,你可以做的事情,如

all.with(BLACK).first().connections()

I think that solving this problem with a graph makes sense. 我认为用图表解决这个问题是有道理的。 You mention the possibility of using a graph database which I think will allow you to better focus on your problem as opposed to coding infrastructure. 您提到使用图形数据库的可能性,我认为这将使您能够更好地专注于您的问题,而不是编码基础设施。 A simple in-memory graph like TinkerGraph from the TinkerPop project would be a good place to start. TinkerPop项目中的TinkerGraph这样的简单内存图将是一个很好的起点。

By using TinkerGraph you then get access to a query language called Gremlin (also see GremlinDocs )which can help answer the questions you posed in your post. 通过使用TinkerGraph,您可以访问名为Gremlin的查询语言(也可以参见GremlinDocs ),它可以帮助回答您在帖子中提出的问题。 Here's a Gremlin session in the REPL which show how to construct the graph you presented and some sample graph traversals that yield the answers you wanted...this first part simple constructs the graph given your example: 这里是REPL中的一个Gremlin会话,它展示了如何构建你呈现的图形以及一些示例图形遍历,它们可以产生你想要的答案......第一部分简单构造了给出你的例子的图形:

gremlin> g = new TinkerGraph()
==>tinkergraph[vertices:0 edges:0]
gremlin> a = g.addVertex("A",['color':'black','width':'thin','status':'invalid']) 
==>v[A]
gremlin> b = g.addVertex("B",['color':'white','width':'thin','status':'valid'])  
==>v[B]
gremlin> c = g.addVertex("C",['color':'black','width':'thick','status':'invalid'])
==>v[C]
gremlin> a.addEdge('connection',b)                                                
==>e[0][A-connection->B]
gremlin> a.addEdge('connection',c)                                                
==>e[1][A-connection->C]
gremlin> b.addEdge('connection',a)                                                
==>e[2][B-connection->A]
gremlin> c.addEdge('connection',a)                                                
==>e[3][C-connection->A]
gremlin> c.addEdge('connection',b)                                                
==>e[4][C-connection->B]

Now the queries: 现在查询:

// black - yields [A,C]
gremlin> g.V.has('color','black')
==>v[A]
==>v[C]

// black.thick - yields C
gremlin> g.V.has('width','thick')
==>v[C]

// A.connections[0].connections[0] - yields A
gremlin> a.out.out[0]
==>v[A]

// black[0].connections[0] - yields B
gremlin> g.V.has('color','black')[0].out[0]
==>v[B]

While this approach does introduce some learning curve if you are unfamiliar with the stack, I think you'll find that graphs fit as solutions to many problems and having some experience with the TinkerPop stack will be generally helpful for other scenarios you encounter. 虽然如果您不熟悉堆栈,这种方法确实会引入一些学习曲线,但我认为您会发现图形适合作为许多问题的解决方案,并且对TinkerPop堆栈有一些经验对于您遇到的其他场景通常会有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM