简体   繁体   中英

What is stored in HDFS and why map-reduce compute so fast when Titan execute queries ?

I am learning Titan now. I used Titan with HBase in remote mode.

There are three questions confused me now. I described them in the below:

  1. In TinkerPop3 Documentation, it said "The results of any OLAP operation are stored in HDFS accessible via hdfs." But now I used Titan for OLTP, and what is stored in HDFS in this situation?

  2. When we used Titan connect HBase success(by Java IDE), we can see table created in HBase shell and scan the content. What is the meaning of content 'column' in table? Are they represented the vertex id in graph?

  3. When I tested performance of Titan, I observed the speed of queries faster than normal map-reduce job. Why Titan can achieved it? In Titan-Documentation, they said Titan engine "Titan-Hadoop" using parallel map-reduce model.Can I get more detailed introduction about it?

  1. The Titan architecture diagram helps show the difference between OLTP and OLAP usage. See the right side of the architecture diagram: TinkerPop API - Gremlin. OLTP is the most common Titan usage, no matter which backend storage you select (Cassandra, HBase, BerkeleyDB). When you do an OLTP query with Titan-HBase, nothing is stored in HDFS. In fact, HDFS/Hadoop is not required at all for OLTP with Titan-HBase.

  2. When scanning the contents of the Titan table in HBase, you will find the serialized representation of the graph. Titan uses data compression techniques on keys/columns/values, so you will find that the data isn't human readable. You can read more about the specifics of the storage layout in the Titan docs .

  3. See answer to #1, and you probably have been running OLTP queries. Hadoop-style OLAP graph processing is done via a graph computer. It uses the TitanHBaseInputFormat to read data in from the backend storage, then uses a TinkerPop Graph Computer ( Spark or Giraph ) to run the OLAP job. See the left side of the architecture diagram above: GremlinGraphComputer. There is also some documentation of this in the Titan docs .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM