简体   繁体   English

使用cassandra和elasticsearch后端制作我的Titan db图

[英]Making my titan db graph with cassandra and elasticsearch backend

My problem is that I want to store Product, customer and seller data in titan graph database which has cassandra as storage backend and elasticsearch as indexing backend. 我的问题是我想将产品,客户和卖方数据存储在titan图形数据库中,该数据库具有cassandra作为存储后端,而elasticsearch作为索引后端。 Then I ll be querying that data to make recommendations to both customer and seller. 然后,我将查询该数据以向客户和卖方提出建议。 I am not able to get to the point where I can store my own data .Since data is going to be huge I ll be using cassandra and elasticsearch . 我无法存储自己的数据。由于数据将非常庞大,因此我将使用cassandra和elasticsearch。

What I have done so far is that I have cassandra , elasticsearch set up. 到目前为止,我已经完成了cassandra和elasticsearch的设置。 Now I can run bin/titan.sh start to start cassandra,es and gremlin server I can also play with graph of the gods data by 现在,我可以运行bin / titan.sh start来启动cassandra,es和gremlin服务器,我还可以通过以下方式使用神数据图

gremlin> graph = TitanFactory.open('conf/titan-cassandra-es.properties')
==>standardtitangraph[cassandrathrift:[127.0.0.1]]
gremlin> GraphOfTheGodsFactory.load(graph)
==>null

Now I am trying to find a way to store my product,customer and seller graph data. 现在,我试图找到一种存储我的产品,客户和卖方图形数据的方法。 such that its stored on cassandra and indices are on elasticsearch. 这样就可以将其存储在cassandra和索引上进行elasticsearch。

What steps should I take to do that. 我应该采取什么步骤来做到这一点。 My main language for the project is nodejs and java is out of question due to project constraints. 我的主要项目语言是nodejs,由于项目限制,java毫无疑问。

My questions in short 我的问题简而言之

  1. how to store my own data for titan db to process 如何存储我自己的数据供泰坦数据库处理
  2. Once data is available for processing. 一旦数据可供处理。 I ll be exposing some http apis for making recommendations . 我将公开一些http api来提出建议。 writing in java is out of question due to some constraints. 由于某些限制,用Java编写文字毫无疑问。 How should I go ahead with it.(I think I have only gremlin as the alternative) 我应该如何进行(我认为我只有格雷姆林作为替代品)

I ll be grateful if you can point out my mistakes and drop some bread crumbs in the correct direction 如果您能指出我的错误并朝正确的方向放一些​​面包屑,我将不胜感激

If you can't use Java then you are limited to using Groovy. 如果您不能使用Java,则只能使用Groovy。 As for 至于

how to store my own data for titan db to process 如何存储我自己的数据供泰坦数据库处理

Side Note 边注

With a graph DB there are a multitudes of ways of storing this data. 使用图形数据库,可以使用多种方式存储此数据。 If you want to really formalise the structure of your data I would recommend looking into Ontologies , OWL , and Topic Maps these can serve as great inspiration for how to formalise and structure the data in a graph DB. 如果要真正形式化数据的结构,我建议您研究一下OntologiesOWLTopic Maps,它们可以为如何形式化和构造图形数据库中的数据提供很大的启发。 These reads are only good if you looking for ways of very formally structuring data in graphs. 仅当您寻找在图形中非常正式地构造数据的方式时,这些读取才有用。

Structure Example 结构实例

For now let's assume you just want to to track customers and the products they have bought. 现在让我们假设你只想跟踪客户和他们所购买的产品 One simple structure is that both customers and products are vertices with an edge from a customer to a product serving as the fact that a customer has bought that product. 一个简单的结构是, 客户产品都是具有从客户到产品的优势的顶点,这是客户购买了该产品的事实。 We can even put additional data on that edge such as time of purchase and quantity . 我们甚至可以在该边缘添加其他数据,例如购买时间数量 Here is an example of how to do that in Groovy: 这是如何在Groovy中执行此操作的示例:

g = TitanFactory.open("titan-cassandra-es.properties")
gremlin> customerBob = g.addVertex("Bob"); 
==>v[12]
gremlin> customerAlice = g.addVertex("Alice");
==>v[13]
gremlin> productFish = g.addVertex("Fish");
==>v[14]
gremlin> productMeat = g.addVertex("Meat");
==>v[15]
gremlin> edge = customerBob.addEdge("purchased", productMeat, "Day", "Friday", "Qauntity", 2);
==>e[16][12-purchased->15]
gremlin> edge = customerBob.addEdge("purchased", productFish, "Day", "Friday", "Qauntity", 1);
==>e[17][12-purchased->14]
gremlin> edge = customerAlice.addEdge("purchased", productMeat, "Day", "Monday", "Qauntity", 3);
==>e[18][13-purchased->15]

The above basically says that Bob bought some Meat and Fish on Friday while Alice bought some Meat on Monday. 以上基本上说鲍勃在星期五买了一些肉和鱼,而爱丽丝在星期一买了一些肉。 If we wanted to find out what Bob bought on Friday, we could make the following traversal 如果我们想了解鲍勃星期五买的东西,可以进行以下遍历

gremlin> g.traversal().V().hasLabel("Bob").outE("purchased").has("Day", "Friday").otherV().label();
==>Meat
==>Fish

Indexing 索引编制

Before really diving into indexing play around with understanding the structure. 在真正深入索引之前,请先了解结构。 The following is a VERY skeletal explanation on indexing with Elasticsearch and Titan: 以下是对使用Elasticsearch和Titan进行索引的非常简要的说明:

With regards to indexing, know that titan has different types of indices, Composite , Vertex-Centric , and Mixed all serve their purpose and you should read this for more info. 关于索引,要知道titan具有不同类型的索引, CompositeVertex-CentricMixed都有其用途,您应该阅读这篇以获得更多信息。

Indexing is used to speed up traversals and lookups. 索引用于加速遍历和查找。 So you need to decide what to index. 因此,您需要确定要编制索引的内容。 For our example we want to quickly know all purchases made on different days. 对于我们的示例,我们想快速了解在不同日期进行的所有购买。 This means that we can put a mixed index on edges to help us (composite indices serve just as well but you are asking about elasticsearch so we going to use a mixed index). 这意味着我们可以在边缘放置混合索引以帮助我们(复合索引同样有用,但是您在询问Elasticsearch,因此我们将使用混合索引)。

To define a mixed index we start by defining a simple schema (more info here ): 要定义混合索引,我们首先定义一个简单的架构(更多信息在此处 ):

mgmt = graph.openManagement();
purchased = mgmt.makeEdgeLabel("purchased").multiplicity(MULTI).make();
day = mgmt.makePropertyKey("Day").dataType(String.class).make();

You don't need to explicitly define the schema for everything but it is essential for anything you want to index. 您无需为所有内容显式定义架构,但是它对于您要索引的任何内容都是必不可少的。 Now you can create your index: 现在您可以创建索引:

mgmt.buildIndex("productsPurchased", Edge.class).addKey(day).buildMixedIndex("search")
mgmt.commit() //"search" is defined in your titan-conf.properties file

With this index queries such as: 使用此索引查询,例如:

g.traversal().E().has("Day", "Friday")

will be much faster. 会更快。

Note : You should make your indices and schema before loading data. 注意 :加载数据之前,应先创建索引和架构。 It just makes things simpler in the long run. 从长远来看,这只会使事情变得简单。

Because your main language is JavaScript/Node.js, you can use https://www.npmjs.com/package/gremlin which is a WebSocket client for TinkerPop3 Gremlin Server (disclaimer: library author here). 因为您的主要语言是JavaScript / Node.js,所以您可以使用https://www.npmjs.com/package/gremlin ,它是TinkerPop3 Gremlin Server的WebSocket客户端(免责声明:此处的库作者)。 You use the client to send strings of Gremlin-Groovy queries to a remote Gremlin Server. 您可以使用客户端将Gremlin-Groovy查询字符串发送到远程Gremlin Server。

The most basic way of interacting with the graph is: 与图进行交互的最基本方法是:

import { createClient } from 'gremlin';

const client = createClient(8182, 'localhost');

client.execute('g.V()', (err, results) => {
    // handle err or results
}

There are more advanced modes detailed in the documentation. 文档中详细介绍了更多高级模式。 The client also supports bound parameters for better security and performance. 客户端还支持绑定参数,以提高安全性和性能。

It may be too early to comment on your domain and data modeling so I'll just stick with the environment part of your question in order to get you started. 现在就对您的域和数据建模发表评论还为时过早,因此为了让您入门,我将坚持您问题的环境部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM