简体   繁体   中英

Making my titan db graph with cassandra and elasticsearch backend

My problem is that I want to store Product, customer and seller data in titan graph database which has cassandra as storage backend and elasticsearch as indexing backend. Then I ll be querying that data to make recommendations to both customer and seller. I am not able to get to the point where I can store my own data .Since data is going to be huge I ll be using cassandra and elasticsearch .

What I have done so far is that I have cassandra , elasticsearch set up. Now I can run bin/titan.sh start to start cassandra,es and gremlin server I can also play with graph of the gods data by

gremlin> graph = TitanFactory.open('conf/titan-cassandra-es.properties')
==>standardtitangraph[cassandrathrift:[127.0.0.1]]
gremlin> GraphOfTheGodsFactory.load(graph)
==>null

Now I am trying to find a way to store my product,customer and seller graph data. such that its stored on cassandra and indices are on elasticsearch.

What steps should I take to do that. My main language for the project is nodejs and java is out of question due to project constraints.

My questions in short

  1. how to store my own data for titan db to process
  2. Once data is available for processing. I ll be exposing some http apis for making recommendations . writing in java is out of question due to some constraints. How should I go ahead with it.(I think I have only gremlin as the alternative)

I ll be grateful if you can point out my mistakes and drop some bread crumbs in the correct direction

If you can't use Java then you are limited to using Groovy. As for

how to store my own data for titan db to process

Side Note

With a graph DB there are a multitudes of ways of storing this data. If you want to really formalise the structure of your data I would recommend looking into Ontologies , OWL , and Topic Maps these can serve as great inspiration for how to formalise and structure the data in a graph DB. These reads are only good if you looking for ways of very formally structuring data in graphs.

Structure Example

For now let's assume you just want to to track customers and the products they have bought. One simple structure is that both customers and products are vertices with an edge from a customer to a product serving as the fact that a customer has bought that product. We can even put additional data on that edge such as time of purchase and quantity . Here is an example of how to do that in Groovy:

g = TitanFactory.open("titan-cassandra-es.properties")
gremlin> customerBob = g.addVertex("Bob"); 
==>v[12]
gremlin> customerAlice = g.addVertex("Alice");
==>v[13]
gremlin> productFish = g.addVertex("Fish");
==>v[14]
gremlin> productMeat = g.addVertex("Meat");
==>v[15]
gremlin> edge = customerBob.addEdge("purchased", productMeat, "Day", "Friday", "Qauntity", 2);
==>e[16][12-purchased->15]
gremlin> edge = customerBob.addEdge("purchased", productFish, "Day", "Friday", "Qauntity", 1);
==>e[17][12-purchased->14]
gremlin> edge = customerAlice.addEdge("purchased", productMeat, "Day", "Monday", "Qauntity", 3);
==>e[18][13-purchased->15]

The above basically says that Bob bought some Meat and Fish on Friday while Alice bought some Meat on Monday. If we wanted to find out what Bob bought on Friday, we could make the following traversal

gremlin> g.traversal().V().hasLabel("Bob").outE("purchased").has("Day", "Friday").otherV().label();
==>Meat
==>Fish

Indexing

Before really diving into indexing play around with understanding the structure. The following is a VERY skeletal explanation on indexing with Elasticsearch and Titan:

With regards to indexing, know that titan has different types of indices, Composite , Vertex-Centric , and Mixed all serve their purpose and you should read this for more info.

Indexing is used to speed up traversals and lookups. So you need to decide what to index. For our example we want to quickly know all purchases made on different days. This means that we can put a mixed index on edges to help us (composite indices serve just as well but you are asking about elasticsearch so we going to use a mixed index).

To define a mixed index we start by defining a simple schema (more info here ):

mgmt = graph.openManagement();
purchased = mgmt.makeEdgeLabel("purchased").multiplicity(MULTI).make();
day = mgmt.makePropertyKey("Day").dataType(String.class).make();

You don't need to explicitly define the schema for everything but it is essential for anything you want to index. Now you can create your index:

mgmt.buildIndex("productsPurchased", Edge.class).addKey(day).buildMixedIndex("search")
mgmt.commit() //"search" is defined in your titan-conf.properties file

With this index queries such as:

g.traversal().E().has("Day", "Friday")

will be much faster.

Note : You should make your indices and schema before loading data. It just makes things simpler in the long run.

Because your main language is JavaScript/Node.js, you can use https://www.npmjs.com/package/gremlin which is a WebSocket client for TinkerPop3 Gremlin Server (disclaimer: library author here). You use the client to send strings of Gremlin-Groovy queries to a remote Gremlin Server.

The most basic way of interacting with the graph is:

import { createClient } from 'gremlin';

const client = createClient(8182, 'localhost');

client.execute('g.V()', (err, results) => {
    // handle err or results
}

There are more advanced modes detailed in the documentation. The client also supports bound parameters for better security and performance.

It may be too early to comment on your domain and data modeling so I'll just stick with the environment part of your question in order to get you started.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM