简体   繁体   中英

Apache Ignite 2.6.0 - High latency in Partition mode with 6 nodes cluster

  1. I have a 6 node ignite cluster(version 2.6.0) with partition mode, want to use it only for caching to minimise the load on database. I am not pre-loading any data.
  2. App will try to read from cache, if data is missed, will go to database and then will add it back to cache.
  3. I am using both Key/Value and SQL caches. During load testing, we found that both K/V and SQL caches are taking more 500 ms of time to get the data from cache.
  4. But with single instance of Ignite node, the results(GET request) are in the range of 10-20ms.

Please let me know if I am missing anything.

let me know if anymore data is needed.

Server Configuration has 3 data regions.(Shared the configuration below)

Server Side configration

<beans xmlns="http://www.springframework.org/schema/beans"  
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" 
       http://www.springframework.org/schema/beans
       http://www.springframework.org/schema/beans/spring-beans.xsd">

  <bean id="grid.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
    <property name="igniteInstanceName" value="igniteStart"/>
    <property name="discoverySpi" ref="discoverySpi"/>
    <property name="communicationSpi" ref="communicationSpi"/>
    <property name="dataStorageConfiguration" ref="dataStorageConfiguration" />
    <property name="gridLogger" ref="gridLogger" />
  </bean>

  <bean id="gridLogger" class="org.apache.ignite.logger.log4j2.Log4J2Logger">
    <constructor-arg type="java.lang.String" value="/opt/ignite/apache-ignite-fabric-2.6.0-bin/config/log4j2.xml"/>
  </bean>

  <bean id="discoverySpi" class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">


    <property name="joinTimeout" value="0"/>
    <property name="reconnectCount" value="100"/>
    <property name="reconnectDelay" value="10000"/>

    <property name="ipFinder">
      <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
<property name="addresses">
          <list>
            <value>127.0.0.1</value>

            <value>127.0.0.1:47500..47509</value>
          </list>
        </property>
      </bean>
    </property>

    </bean>

  <bean id="communicationSpi" class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
    <property name="messageQueueLimit" value="1024"/>
    <property name="slowClientQueueLimit" value="1000"/>
  </bean>

  <bean id="dataStorageConfiguration" class="org.apache.ignite.configuration.DataStorageConfiguration">
    <property name="defaultDataRegionConfiguration">
      <bean class="org.apache.ignite.configuration.DataRegionConfiguration">

        <property name="name" value="Default_Region"/>

        <property name="initialSize" value="#{15L * 1024 * 1024}"/>

        <property name="maxSize" value="#{20L * 1024 * 1024}"/>

        <property name="pageEvictionMode" value="RANDOM_2_LRU"/>
      </bean>
    </property>

<property name="dataRegionConfigurations">
      <list>
        <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
          <property name="name" value="Big_Region"/>

          <property name="initialSize" value="#{20.0D * 1024 * 1024 *1024}"/>

          <property name="maxSize" value="#{25.0D * 1024 * 1024 *1024}"/>
          <property name="pageEvictionMode" value="RANDOM_2_LRU"/>
        </bean>


        <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
          <property name="name" value="Medium_Data_Region"/>

          <property name="initialSize" value="#{8.0D * 1024 * 1024 * 1024}"/>

          <property name="maxSize" value="#{10.0D * 1024 * 1024 * 1024}"/>
          <property name="pageEvictionMode" value="RANDOM_2_LRU"/>
        </bean>

        <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
          <property name="name" value="Small_Data_Region"/>

          <property name="initialSize" value="#{4.0D * 1024 * 1024 * 1024}"/>

          <property name="maxSize" value="#{5.0D * 1024 * 1024 * 1024}"/>
          <property name="pageEvictionMode" value="RANDOM_2_LRU"/>
        </bean>
      </list>
    </property>
  </bean>
</beans>

Key-value operations' performance shouldn't decrease as long as every call goes to one of the nodes all the times. So something is missed in your single vs multiple nodes test. Ensure that in the single node scenario, the node is remote in relation to the client (there is network in between). Otherwise, there is a bottleneck on the client end if you see a performance degradation by adding more nodes.

For SQL operations ensure that data is collocated properly if you use joins. Also, keep in mind that in most of the cases SQL will be broadcasted. Thus, if the data size is small and you add more nodes your are just adding extra network roundtrips. Check more details here: https://apacheignite-sql.readme.io/docs/performance-and-debugging

Finally, consider pin pointing the bottleneck with JFR. It might be the case that Spring Data or the app has a hot spot: https://apacheignite.readme.io/docs/jvm-and-system-tuning#section-flightrecorder-settings

My guess is that you are creating a new connection (ODBC/JDBC connection, or even worse, thick client connection) to cluster every time you need to read something.

This is not the way to go. You should try and reuse same connection if you wish to see good latency.

Please also revisit Cache Store / 3rd Party Persistence to automate database lookups when data not found in cache.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM