I'm using update_by_query to update a whole index fields, it may be 30,000,000 rows or even larger in the future, I read the document about this parameter, and I knew it's 1K default, but I didn't see any documents about it.
So the question is, * how large can scroll_size be? * will it takes more memories when it's larger? * if it does take more memories, are there any replacements?
My function:
POST /myIndex/myType/_update_by_query?conflicts=proceed&scroll_size=20000
json
{
"script": {
"source": "ctx._source['toUserNickname'] = 'test'",
"lang": "painless"
},
"query": {
"bool": {
"must": [
{
"match": {
"toUserId": "111"
}
}
]
}
}
}
There is no max - there are various variables you can adjust to ensure that it doesnt take up too much memory/time.
reading up on "pagination" will be helpful - https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-from-size.html
Another similar question: Max scrollable time for elasticsearch
Alternative: parallel scanning - https://hackernoon.com/parallel-scan-scroll-an-elasticsearch-index-db02583d10d1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.