简体   繁体   English

Druid索引任务中的segmentGranularity; 索引期间的确切含义和含义

[英]segmentGranularity in Druid indexing task; exact meaning & implication during indexing

I still don't quite get this "segmentGranularity" in Druid. 我还是不太明白德鲁伊的这种“ segmentGranularity”。 This page is quite ambiguous: http://druid.io/docs/latest/design/segments.html . 该页面含糊不清: http : //druid.io/docs/latest/design/segments.html It goes on mentioning segmentGranularity but it talks more about intervals (in the first paragraph). 它继续提到segmentGranularity,但它更多地讨论了间隔(在第一段中)。

Anyway, at this point the volume of my data is not that big. 无论如何,此时我的数据量还不是很大。 That page mentioned 300mb-700mb is the "ideal" size of a segment. 提到的300mb-700mb页面是段的“理想”大小。 Actually I can fit a week of data into one segment. 实际上,我可以将一个星期的数据分为一个部分。 That's why I'm thinking of setting segmentGranularity to "week" in my indexing-task json: 这就是为什么我要在我的indexing-task json中将segmentGranularity设置为“ week”的原因:

  "granularitySpec" : {
    "type" : "uniform",
    "segmentGranularity" : "week",
    "queryGranularity" : "none",
    "intervals" : ["2015-09-12/2015-09-13"]
  },

However, I plan to do the batch indexing every one hour (and this will normally only (re)process data within that same day). 但是,我计划每隔一小时进行一次批索引(通常只会在同一天(重新)处理数据)。 So that's why I put only one interval, that spans one day, in the "intervals" field above. 这就是为什么我在上方的“间隔”字段中只设置了一个跨度为一天的间隔的原因。

My question: how would that work when the segmentGranularity is set to week (instead of day)? 我的问题:将segmentGranularity设置为week(而不是day)时,它将如何工作? Will it rebuild the cube for the entire segment (of a week)? 它会在整个星期(一周)内重建多维数据集吗? Which is something I don't want; 这是我不想要的; I want only to rebuild the cube for the day. 我只想重建当天的多维数据集。

Thanks, Raka 谢谢,拉卡

Yes segment granularity period specifies for what duration data should be kept in a particular segment. 是,段粒度周期指定在特定段中应保留的持续时间数据。 If your segment is set to weekly than each segment would hold data of a particular week. 如果您的细分设置为每周,则每个细分将保存特定一周的数据。

Now if you are going to run ingestion task every hour, than the entire segment gets re-build, if you have addition of data only for the day, than its generally better to keep your segment granularity to "day". 现在,如果您要每小时运行一次提取任务,则整个段将得到重新构建,如果您仅在一天中添加数据,则比将段粒度保持为“天”更好。

But you can very well keep the segment granularity to "week" if your data is small, it shouldn't matter whether druid rebuilds the segments. 但是如果您的数据很小,您可以很好地将段的粒度保持为“周”,那么德鲁伊是否重新构建段无关紧要。

Since your data set is small, you can look into tranquility server, which can ingest data on the fly without batch ingestion. 由于您的数据集很小,因此您可以查看宁静服务器,该服务器可以即时提取数据而无需批量提取。 It should do fine for your use case. 它应该适合您的用例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM