简体繁体 English

当Titan执行查询时，HDFS中存储的内容以及为什么map-reduce计算速度如此之快？

[英]What is stored in HDFS and why map-reduce compute so fast when Titan execute queries ？

原文 2016-08-22 12:44:16 7 1 database/ hadoop/ graph/ hbase/ titan

I am learning Titan now. 我现在正在学习泰坦。 I used Titan with HBase in remote mode. 我在远程模式下使用Titan和HBase。

There are three questions confused me now. 现在有三个问题让我很困惑。 I described them in the below: 我在下面描述了它们：

In TinkerPop3 Documentation, it said "The results of any OLAP operation are stored in HDFS accessible via hdfs." 在TinkerPop3文档中，它说“任何OLAP操作的结果都存储在可通过hdfs访问的HDFS中。” But now I used Titan for OLTP, and what is stored in HDFS in this situation? 但是现在我使用Titan进行OLTP，以及在这种情况下存储在HDFS中的内容是什么？
When we used Titan connect HBase success(by Java IDE), we can see table created in HBase shell and scan the content. 当我们使用Titan连接HBase成功（通过Java IDE）时，我们可以看到在HBase shell中创建的表并扫描内容。 What is the meaning of content 'column' in table？ Are they represented the vertex id in graph? 表中内容'列'的含义是什么？它们是否表示图中的顶点id？
When I tested performance of Titan, I observed the speed of queries faster than normal map-reduce job. 当我测试Titan的性能时，我观察到查询的速度比正常的map-reduce工作更快。 Why Titan can achieved it? 为什么泰坦可以实现它？ In Titan-Documentation, they said Titan engine "Titan-Hadoop" using parallel map-reduce model.Can I get more detailed introduction about it? 在Titan-Documentation中，他们使用并行map-reduce模型说Titan引擎“Titan-Hadoop”。我可以得到更详细的介绍吗？

1 个解决方案

The Titan architecture diagram helps show the difference between OLTP and OLAP usage. Titan 架构图有助于显示OLTP和OLAP使用之间的区别。 See the right side of the architecture diagram: TinkerPop API - Gremlin. 请参阅架构图的右侧：TinkerPop API - Gremlin。 OLTP is the most common Titan usage, no matter which backend storage you select (Cassandra, HBase, BerkeleyDB). 无论您选择哪种后端存储（Cassandra，HBase，BerkeleyDB），OLTP都是Titan最常用的用法。 When you do an OLTP query with Titan-HBase, nothing is stored in HDFS. 使用Titan-HBase进行OLTP查询时，HDFS中不存储任何内容。 In fact, HDFS/Hadoop is not required at all for OLTP with Titan-HBase. 实际上，对于使用Titan-HBase的OLTP，完全不需要HDFS / Hadoop。
When scanning the contents of the Titan table in HBase, you will find the serialized representation of the graph. 在HBase中扫描Titan表的内容时，您将找到该图的序列化表示。 Titan uses data compression techniques on keys/columns/values, so you will find that the data isn't human readable. Titan在键/列/值上使用数据压缩技术，因此您会发现数据不是人类可读的。 You can read more about the specifics of the storage layout in the Titan docs . 您可以在Titan文档中阅读有关存储布局细节的更多信息。
See answer to #1, and you probably have been running OLTP queries. 请参阅＃1的答案，您可能已经在运行OLTP查询。 Hadoop-style OLAP graph processing is done via a graph computer. Hadoop风格的OLAP图形处理通过图形计算机完成。 It uses the TitanHBaseInputFormat to read data in from the backend storage, then uses a TinkerPop Graph Computer ( Spark or Giraph ) to run the OLAP job. 它使用TitanHBaseInputFormat从后端存储中读取数据，然后使用TinkerPop图形计算机（ Spark或Giraph ）来运行OLAP作业。 See the left side of the architecture diagram above: GremlinGraphComputer. 请参阅上面的架构图的左侧：GremlinGraphComputer。 There is also some documentation of this in the Titan docs . Titan文档中还有一些这方面的文档。