简体   繁体   中英

ArangoDB WARNING [3ad54] {engines} slow background settings sync

I have a ArangoDB 3.8.7 database running on a AWS instance that has ~200 million records (~ 1000 new records per minute).

During the day when user request is higher I keep seeing this warning in the database logs and the requests responses starts getting really slow (from the normal ~500 mS to 5-15 secs).

WARNING [3ad54] {engines} slow background settings sync

I use a large AWS instance c5a.12xlarge (48 vCPUs) with 98 GB RAM and even AWS analysis shows my instance is over provisioned.

i-0c41xxxxxxxxxxx is over-provisioned
Compute Optimizer found that this instance's CPU, network bandwidth and network PPS are over-provisioned.

I'm running a WAL compaction task every 60 seconds. (i've tried lowering it to 15 seconds and it seems it gets a little worse). When it was 10 minutes was also terrible.

2022-11-24T14:45:35Z [1303] WARNING [3ad54] {engines} slow background settings sync: 9.240683 s
2022-11-24T14:45:49Z [1303] WARNING [3ad54] {engines} slow background settings sync: 11.222022 s
2022-11-24T14:46:05Z [1303] WARNING [3ad54] {engines} slow background settings sync: 14.198186 s
2022-11-24T14:46:18Z [1303] WARNING [3ad54] {engines} slow background settings sync: 10.272200 s
2022-11-24T14:46:34Z [1303] WARNING [3ad54] {engines} slow background settings sync: 13.703265 s
2022-11-24T14:46:35Z [1303] INFO [99d80] {general} --------------------------
2022-11-24T14:46:35Z [1303] INFO [99d80] {general} Running compaction task...
2022-11-24T14:46:35Z [1303] INFO [99d80] {general} Compacting access...
2022-11-24T14:46:35Z [1303] INFO [99d80] {general} Compacting accounts...
2022-11-24T14:46:35Z [1303] INFO [99d80] {general} Compacting addresses...
2022-11-24T14:46:35Z [1303] INFO [99d80] {general} Compacting products...
2022-11-24T14:46:35Z [1303] INFO [99d80] {general} Compacting phones...
2022-11-24T14:46:35Z [1303] INFO [99d80] {general} Compacting call_log...
2022-11-24T14:46:35Z [1303] INFO [99d80] {general} --------------------------

Is there a way to optimize this since my instance is more than enough to handle it? And what exactly does this warning means?

Edit: Today I've upgraded to ArangoDB 3.10.1 and also upgraded my AWS instance to c6a.16xlarge (64 vCPUs).!! And the problem persists.

BTW: the main issues are not the warning messages themselves, the issue is the lag, data corruption/writing lock errors and huge delays that occurs when these warnings are being shown.

Dec 01 01:24:31 sudo[1402]: Caused by: com.arangodb.ArangoDBException: Response: 409, Error: 1200 - AQL: timeout waiting to lock key Operation timed out: Timeout waiting to lock key; key: 12430138595 (while executing)
Dec 01 01:24:31 sudo[1402]:         at com.arangodb.internal.util.ResponseUtils.checkError(ResponseUtils.java:55)
Dec 01 01:24:31 sudo[1402]:         at com.arangodb.internal.velocystream.VstCommunication.checkError(VstCommunication.java:157)
Dec 01 01:24:31 sudo[1402]:         at com.arangodb.internal.velocystream.VstCommunicationSync.execute(VstCommunicationSync.java:144)
Dec 01 01:24:31 sudo[1402]:         at com.arangodb.internal.velocystream.VstCommunicationSync.execute(VstCommunicationSync.java:45)
Dec 01 01:24:31 sudo[1402]:         at com.arangodb.internal.velocystream.VstCommunication.execute(VstCommunication.java:149)
Dec 01 01:24:31 sudo[1402]:         at com.arangodb.internal.velocystream.VstCommunication.execute(VstCommunication.java:144)
Dec 01 01:24:31 sudo[1402]:         at com.arangodb.internal.velocystream.VstProtocol.execute(VstProtocol.java:46)
Dec 01 01:24:31 sudo[1402]:         at com.arangodb.internal.ArangoExecutorSync.execute(ArangoExecutorSync.java:71)
Dec 01 01:24:31 sudo[1402]:         at com.arangodb.internal.ArangoExecutorSync.execute(ArangoExecutorSync.java:57)
Dec 01 01:24:31 sudo[1402]:         at com.arangodb.internal.ArangoDatabaseImpl.query(ArangoDatabaseImpl.java:171)
Dec 01 01:24:31 sudo[1402]:         at com.arangodb.springframework.core.template.ArangoTemplate.query(ArangoTemplate.java:358)
Dec 01 01:24:31 sudo[1402]:         at com.arangodb.springframework.repository.query.AbstractArangoQuery.execute(AbstractArangoQuery.java:83)
Dec 01 01:24:31 sudo[1402]:         at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor$QueryMethodInvoker.invoke(QueryExecutorMethodInterceptor.java:195)
Dec 01 01:24:31 sudo[1402]:         at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.doInvoke(QueryExecutorMethodInterceptor.java:152)
Dec 01 01:24:31 sudo[1402]:         at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.invoke(QueryExecutorMethodInterceptor.java:130)
Dec 01 01:24:31 sudo[1402]:         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
Dec 01 01:24:31 sudo[1402]:         at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:95)
Dec 01 01:24:31 sudo[1402]:         at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
Dec 01 01:24:31 sudo[1402]:         at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212)

There might be a few issues to look up

If the database is not under heavy load and is running on a fast system, you may need to increase the --server.background-sync-wait-threshold setting in the database configuration( BECAREFULL ).

This setting determines the maximum amount of time that the database will wait for the background settings sync to complete before issuing a warning. Increasing this value will allow the database to wait longer for the sync to complete, but may result in slower performance.

Also, you may check if the database is running on a slow system by looking at the system specifications, such as the CPU and disk speed. If the system is slow, you may need to upgrade to a faster system or use a database engine that is optimized for slow systems.

Heavy load on the DB server might raise the issue too.

You also might check this GitHub Issue #15080

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM