简体   繁体   English

当Titan执行查询时,HDFS中存储的内容以及为什么map-reduce计算速度如此之快?

[英]What is stored in HDFS and why map-reduce compute so fast when Titan execute queries ?

I am learning Titan now. 我现在正在学习泰坦。 I used Titan with HBase in remote mode. 我在远程模式下使用Titan和HBase。

There are three questions confused me now. 现在有三个问题让我很困惑。 I described them in the below: 我在下面描述了它们:

  1. In TinkerPop3 Documentation, it said "The results of any OLAP operation are stored in HDFS accessible via hdfs." 在TinkerPop3文档中,它说“任何OLAP操作的结果都存储在可通过hdfs访问的HDFS中。” But now I used Titan for OLTP, and what is stored in HDFS in this situation? 但是现在我使用Titan进行OLTP,以及在这种情况下存储在HDFS中的内容是什么?

  2. When we used Titan connect HBase success(by Java IDE), we can see table created in HBase shell and scan the content. 当我们使用Titan连接HBase成功(通过Java IDE)时,我们可以看到在HBase shell中创建的表并扫描内容。 What is the meaning of content 'column' in table? Are they represented the vertex id in graph? 表中内容'列'的含义是什么?它们是否表示图中的顶点id?

  3. When I tested performance of Titan, I observed the speed of queries faster than normal map-reduce job. 当我测试Titan的性能时,我观察到查询的速度比正常的map-reduce工作更快。 Why Titan can achieved it? 为什么泰坦可以实现它? In Titan-Documentation, they said Titan engine "Titan-Hadoop" using parallel map-reduce model.Can I get more detailed introduction about it? 在Titan-Documentation中,他们使用并行map-reduce模型说Titan引擎“Titan-Hadoop”。我可以得到更详细的介绍吗?

  1. The Titan architecture diagram helps show the difference between OLTP and OLAP usage. Titan 架构图有助于显示OLTP和OLAP使用之间的区别。 See the right side of the architecture diagram: TinkerPop API - Gremlin. 请参阅架构图的右侧:TinkerPop API - Gremlin。 OLTP is the most common Titan usage, no matter which backend storage you select (Cassandra, HBase, BerkeleyDB). 无论您选择哪种后端存储(Cassandra,HBase,BerkeleyDB),OLTP都是Titan最常用的用法。 When you do an OLTP query with Titan-HBase, nothing is stored in HDFS. 使用Titan-HBase进行OLTP查询时,HDFS中不存储任何内容。 In fact, HDFS/Hadoop is not required at all for OLTP with Titan-HBase. 实际上,对于使用Titan-HBase的OLTP,完全不需要HDFS / Hadoop。

  2. When scanning the contents of the Titan table in HBase, you will find the serialized representation of the graph. 在HBase中扫描Titan表的内容时,您将找到该图的序列化表示。 Titan uses data compression techniques on keys/columns/values, so you will find that the data isn't human readable. Titan在键/列/值上使用数据压缩技术,因此您会发现数据不是人类可读的。 You can read more about the specifics of the storage layout in the Titan docs . 您可以在Titan文档中阅读有关存储布局细节的更多信息。

  3. See answer to #1, and you probably have been running OLTP queries. 请参阅#1的答案,您可能已经在运行OLTP查询。 Hadoop-style OLAP graph processing is done via a graph computer. Hadoop风格的OLAP图形处理通过图形计算机完成。 It uses the TitanHBaseInputFormat to read data in from the backend storage, then uses a TinkerPop Graph Computer ( Spark or Giraph ) to run the OLAP job. 它使用TitanHBaseInputFormat从后端存储中读取数据,然后使用TinkerPop图形计算机( Spark或Giraph )来运行OLAP作业。 See the left side of the architecture diagram above: GremlinGraphComputer. 请参阅上面的架构图的左侧:GremlinGraphComputer。 There is also some documentation of this in the Titan docs . Titan文档中还有一些这方面的文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 map-reduce和nosql有什么关系? - What is the relationship between map-reduce and nosql? 如何快速执行多个mysql查询 - how to execute multiple mysql queries fast 在多个map-reduce作业之间传递数据库连接对象 - Passing around DB connection objects between multiple map-reduce jobs 使用简单的map-reduce列出存储桶与bucket.get_keys()中的所有键? - Using a simple map-reduce to list all keys in a bucket vs. bucket.get_keys()? 什么是快速执行且CPU占用率低的方法-从表中的db(存储为二进制文件)中读取非规范化数据或从二进制文件中读取相同数据? - What is fast to execute and low on CPU - Reading Denormalized data from db (stored as binary) in table or Reading the same from binary file? 为何操作员如此快速 - Why is like-operator so fast 如何仅当 hdfs 中有文件时才执行命令 - How can i execute a command only when there are files in hdfs 数据未存储在我的数据库中 - 为什么会这样? - Data is not stored in my Database - why so? 在我的情况下减少sql查询的最佳方法是什么 - What is the best way to reduce sql queries in my situation 合并在一起时,非​​常快速和简单的查询非常慢 - 2 very fast and simple queries are terribly slow when merged together
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM