简体   繁体   English

Elasticsearch运行线程过多

[英]Elasticsearch too many running threads

We have a big problem with our ES cluster. 我们的ES集群存在很大问题。 One of our nodes is always on 99% CPU. 我们的一个节点始终使用99%的CPU。 For some reason it has about 3 times more threads running for the elasticsearch process compared to normal node. 由于某些原因,与普通节点相比,它的elasticsearch过程运行的线程数大约是3倍。 I have attached 2 htop screenshots for 2 nodes, one overloaded and another normal. 我已经为2个节点附加了2个htop屏幕截图,一个是重载的,另一个是正常的。 Please advise! 请指教!

Thank you! 谢谢!

Overloaded Node 重载节点 重载节点

Normal Node 普通节点 正常节点

UPDATE UPDATE

  1. Cluster architecture: 集群架构:

    11 nodes, 2 dedicated masters, 9 data nodes. 11个节点,2个专用主站,9个数据节点。

  2. Nodes Hardware Properties 节点硬件属性

    Masters: 大师赛:

    • CPU: 8x Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz CPU:8x Intel(R)Xeon(R)CPU E5-1620 v2 @ 3.70GHz
    • Memory: 32GB 内存:32GB
    • Disk: 120GB 磁盘:120GB

    Slaves: 从站:

    1. CPU: 12x Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz CPU:12x Intel(R)Xeon(R)CPU E5-1650 v2 @ 3.50GHz
    2. Memory: 64GB 内存:64GB
    3. Disk: 2.7T 磁盘:2.7T
  3. Documents in cluster: 集群中的文档:

    ~200 Millions ~200万

  4. Index conf: 指数conf:

    Each index is split in 10 shards (5 primary, 5 replica) 每个索引分为10个分片(5个主分片,5个副本)

  5. Queries: 查询:

    Search RT: ~ 250/s , Index RT: ~ 6K/s 搜索RT: ~ 250/s ,索引RT: ~ 6K/s

  6. OS OS

    Ubuntu 12.04.4 LTS

  7. JAVA JAVA

 java version "1.7.0_60" Java(TM) SE Runtime Environment (build 1.7.0_60-b19) Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode) 

Figured it out. 弄清楚了。

[2014-07-07 13:38:42,521][DEBUG][index.search.slowlog.query] [n013.my_cluster] [my_index][3] took[2s], took_millis[2066], types[my_type], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"size":20,"from":0,"sort":{"_score":"desc"},"query":{"filtered":{"query":{"query_string":{"query":"my eight words space separated query","fields":["description","tags"],"default_operator":"OR"}},"filter":{"and":[{"range":{"ats":{"lte":1404730800}}},{"terms":{"aid":[1,2,4]}}]},"_cache":false}}}], extra_source[]

The problem resided inside "filter": {"and": ...} , looks like these kind of queries are heavier for ES compared to bool type queries. 问题出现在"filter": {"and": ...} ,与bool类型查询相比,ES的这类查询看起来更重。 So whenever you want to apply some filters , please use bool filters ( must , must_not and should ) 因此,无论何时您想应用某些filters ,请使用bool过滤器( mustmust_notshould

Reff: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-filter.html Reff: http ://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-filter.html

Cheers! 干杯!

Based on the sparse info at hand, I have a couple of guesses that could potentially be the problem: 基于手头的稀疏信息,我有几个可能是问题的猜测:

  • Shards are not well balanced and you are having hot spotting. 碎片不平衡,你有热点。 Ensure that your most heavily used indexes are sharded in such a way that each machine can do its share of work. 确保以最常用的索引进行分片,以便每台计算机都可以完成其工作。 Also, look into the index level "index.routing.allocation.total_shards_per_node" to try to force an equal balance. 另外,查看索引级别“index.routing.allocation.total_shards_per_node”以尝试强制平衡。

  • Perhaps on the search side, you are specifying that the search should always go to the "primary" shard. 也许在搜索方面,您指定搜索应始终转到“主”分片。 The primary designation isn't something that balances, so basically, the first node up has the primary shard and the others that come up after are all secondaries. 主要指定不是平衡的东西,所以基本上,第一个节点具有主要分片,而其他后面出现的是所有辅助分片。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM