[英]Two FULLTEXT searches on ArangoDb Cluster: V8 is involved
I am investigating ArangoDb cluster and found out that in case of usage two FULLTEXT() searches one of them involves V8 engine. 我正在研究ArangoDb群集,发现在使用情况下,两个FULLTEXT()搜索其中之一涉及V8引擎。 My data:
我的资料:
[
{
"TITL": "Attacks induced by bromocryptin in Parkinson patients",
"WORD": [
"hascites",
"Six patients with Parkinson's disease"
],
"ID":1,
},
{
"TITL": "Linear modeling of possible mechanisms for Parkinson tremor generation",
"WORD": [
"hascites",
"jsubsetIM"
],
"ID":2,
},
{
"TITL": "Drug-induced parkinsonism in the rat- a model for biochemical ...",
"WORD": [
"hascites",
"Following treatment with reserpine or alternatively with ...",
"hasabstract"
],
"ID":3,
}
]
Simplest query: 最简单的查询:
FOR title IN FULLTEXT(pmshort,"TITL","parkinson")
FOR word IN FULLTEXT(pmshort,"WORD","hascites")
FILTER title.ID==word.ID
RETURN title
In other words, I am trying to find all documents that have parkinson
in TITL
and hascites
in WORD
. 换句话说,我正在尝试查找
hascites
中具有parkinson
而WORD
中具有TITL
所有文档。 This example is seriously simplified, so the usage of something like 这个例子被认真地简化了,所以类似
FILTER word.WORD=='hascites'
is not possible. 不可能。 Two or more FULLTEXT searches are required for providing the necessary functionality.
为了提供必要的功能,需要两次或多次FULLTEXT搜索。 Collection includes about 520,000 documents.
馆藏约有52万份文件。 FullText indexes are set up on each field.
在每个字段上设置全文索引。
I found out that each of FULLTEXT queries, being run separately, involves index: 我发现,分别运行的每个FULLTEXT查询都涉及索引:
Execution plan:
Id NodeType Site Est. Comment
1 SingletonNode DBS 1 * ROOT
5 IndexNode DBS 526577 - FOR title IN pmshort /* fulltext index scan */
8 RemoteNode COOR 526577 - REMOTE
9 GatherNode COOR 526577 - GATHER
4 ReturnNode COOR 526577 - RETURN title
But in case of usage both FOR
first one is being processed by V8 (JavaScript) and runs on coordinator, not DBS: 但是,在使用情况下,
FOR
第一个都是由V8(JavaScript)处理的,并在协调器上运行,而不是在DBS上运行:
Execution plan:
Id NodeType Site Est. Comment
1 SingletonNode COOR 1 * ROOT
2 CalculationNode COOR 1 - LET #2 = FULLTEXT(pmshort /* all collection documents */, "TITL", "parkinson") /* v8 expression */
3 EnumerateListNode COOR 100 - FOR title IN #2 /* list iteration */
10 ScatterNode COOR 100 - SCATTER
11 RemoteNode DBS 100 - REMOTE
9 IndexNode DBS 52657700 - FOR word IN pmshort /* fulltext index scan */
6 CalculationNode DBS 52657700 - LET #6 = (title.`ID` == word.`ID`) /* simple expression */ /* collections used: word : pmshort */
7 FilterNode DBS 52657700 - FILTER #6
12 RemoteNode COOR 52657700 - REMOTE
13 GatherNode COOR 52657700 - GATHER
8 ReturnNode COOR 52657700 - RETURN title
Of course, this slows down system a lot. 当然,这会大大减慢系统速度。 So my questions are: 1. Why ArangoDb cluster can't process both conditions on DBS, not on coordinator (COOR)?
所以我的问题是:1.为什么ArangoDb集群不能在DBS上而不是在协调器(COOR)上处理两个条件? 2. How to avoid such situation since performance drops 300-500 times?
2.由于性能下降300-500倍,如何避免这种情况? 3. May be somebody can point on some additional materials to read about this.
3.可能有人可以指出一些其他材料来阅读此内容。
Any help is appreciated. 任何帮助表示赞赏。 Thanks!
谢谢!
It looks like the query optimizer stops looking for further fulltext improvements after having applied one fulltext transformation in each query/subquery. 在每个查询/子查询中应用了一个全文转换之后,查询优化器似乎停止寻求进一步的全文改进。
A potential fix for this can be found in this pull request (which targets 3.3.10). 可以在此拉取请求 (针对3.3.10)中找到可能的解决方案。
Thanks a lot! 非常感谢! It should be available in 3.3.10 and future 3.4, right?
它应该在3.3.10和将来的3.4中可用,对吗?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.