使用 jdbc 将行批量插入 Spanner 时加载性能低

Question

Background: I am trying to load TSV-formatted data files (dumped from MySQL database) into a GCP Spanner table.背景：我正在尝试将 TSV 格式的数据文件（从 MySQL 数据库转储）加载到 GCP Spanner 表中。

client library: the official Spanner JDBC dependency v1.15.0客户端库：官方 Spanner JDBC 依赖 v1.15.0
table schema: two string-typed columns and ten int-typed columns表模式：两个字符串类型的列和十个 int 类型的列
GCP Spanner instance: configured as multi-region nam6 with 5 nodes GCP Spanner 实例：配置为具有 5 个节点的多区域 nam6

My loading program runs in GCP VM and is the exclusive client accessing the Spanner instance.我的加载程序在 GCP VM 中运行，并且是访问 Spanner 实例的专有客户端。 Auto-commit is enabled.启用自动提交。 Batch insertion is the only DML operation executed by my program and the batch size is around 1500. In each commit, it fully uses up the mutation limit, which is 20000. And at the same time, the commit size is below 5MB (the values of two string-typed columns are small-sized).批量插入是我的程序执行的唯一 DML 操作，批量大小约为 1500。在每次提交中，它完全使用了突变限制，即 20000。同时，提交大小低于 5MB（值两个字符串类型的列是小型的）。 Rows are partitioned based on the first column of the primary key so that each commit can be sent to very few partitions for better performance.根据主键的第一列对行进行分区，以便每次提交都可以发送到很少的分区以获得更好的性能。

With all of the configuration and the optimization above, the insertion rate is only around 1k rows per second.通过上述所有配置和优化，插入速率仅为每秒 1k 行左右。 This really disappoints me because I have more than 800million rows to insert.这真的让我很失望，因为我有超过 8 亿行要插入。 I did notice that the official doc mentioned the approx.我确实注意到官方文档提到了大约。 peak write (QPS total) is 1800 for the multi-region Spanner instance.多区域 Spanner 实例的峰值写入（总 QPS）为 1800。

So I have two questions here:所以我在这里有两个问题：

Considering such low peak write QPS, does it mean GCP doesn't expect or doesn't support customers to migrate large datasets to the multi-region Spanner instance?考虑到如此低的峰值写入 QPS，是否意味着 GCP 不期望或不支持客户将大型数据集迁移到多区域 Spanner 实例？
I was seeing the high read latency from Spanner monitoring.我从 Spanner 监控中看到了高读取延迟。 I don't have any read requests.我没有任何阅读请求。 My guess is that whiling writing rows Spanner needs to first read and check whether a row with the same primary key exists.我的猜测是，在写行时 Spanner 需要首先读取并检查是否存在具有相同主键的行。 If my guess is right, why it takes so much time?如果我的猜测是正确的，为什么要花这么多时间？ If not, could I get any guidance on how these read operations happen?如果没有，我能否获得有关这些读取操作如何发生的任何指导？

Answer 1

It's not quite clear to me exactly how you are setting up the client application that is loading the data.我不太清楚你是如何设置加载数据的客户端应用程序的。 My initial impression is that your client application may not be executing enough transactions in parallel.我最初的印象是您的客户端应用程序可能没有并行执行足够的事务。 You should normally be able to insert significantly more than 1,000 rows/second, but it would require that you do execute multiple transactions in parallel, possibly from multiple VM's.您通常应该能够每秒插入超过 1,000 行，但这需要您并行执行多个事务，可能来自多个 VM。 I used the following simple example to test the load throughput from my local machine to a single node Spanner instance, and that gave me a throughput of approx 1,500 rows/second.我使用以下简单示例来测试从本地机器到单节点 Spanner 实例的负载吞吐量，这给了我大约 1,500 行/秒的吞吐量。

A multi-node setup using a client application running in one or more VM's in the same network region as your Spanner instance should be able to achieve higher volumes than that.使用在与 Spanner 实例相同的网络区域中的一个或多个 VM 中运行的客户端应用程序的多节点设置应该能够实现比这更高的容量。

import com.google.api.client.util.Base64;
import com.google.common.base.Stopwatch;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;

public class TestJdbc {

  public static void main(String[] args) {
    final int threads = 512;
    ExecutorService executor = Executors.newFixedThreadPool(threads);
    watch = Stopwatch.createStarted();
    for (int i = 0; i < threads; i++) {
      executor.submit(new InsertRunnable());
    }
  }

  static final AtomicLong rowCount = new AtomicLong();
  static Stopwatch watch;

  static final class InsertRunnable implements Runnable {
    @Override
    public void run() {
      try (Connection connection =
          DriverManager.getConnection(
              "jdbc:cloudspanner:/projects/my-project/instances/my-instance/databases/my-db")) {
        while (true) {
          try (PreparedStatement ps =
              connection.prepareStatement("INSERT INTO Test (Id, Col1, Col2) VALUES (?, ?, ?)")) {
            for (int i = 0; i < 150; i++) {
              ps.setLong(1, rnd.nextLong());
              ps.setString(2, randomString(100));
              ps.setString(3, randomString(100));
              ps.addBatch();
              rowCount.incrementAndGet();
            }
            ps.executeBatch();
          }
          System.out.println("Rows inserted: " + rowCount);
          System.out.println("Rows/second: " + rowCount.get() / watch.elapsed(TimeUnit.SECONDS));
        }
      } catch (SQLException e) {
        throw new RuntimeException(e);
      }
    }

    private final Random rnd = new Random();

    private String randomString(int maxLength) {
      byte[] bytes = new byte[rnd.nextInt(maxLength / 2) + 1];
      rnd.nextBytes(bytes);
      return Base64.encodeBase64String(bytes);
    }
  }
}

There are also a couple of other things that you could try to tune to get better results:您还可以尝试调整其他一些事情以获得更好的结果：

Reducing the number of rows per batch could yield better overall results.减少每批的行数可以产生更好的整体结果。
If possible, using InsertOrUpdate mutation objects is a lot more efficient than using DML statements (see example below).如果可能，使用InsertOrUpdate突变对象比使用 DML 语句更有效（参见下面的示例）。

Example using Mutation instead of DML:使用Mutation而不是 DML 的示例：

import com.google.api.client.util.Base64;
import com.google.cloud.spanner.Mutation;
import com.google.cloud.spanner.jdbc.CloudSpannerJdbcConnection;
import com.google.common.base.Stopwatch;
import com.google.common.collect.ImmutableList;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;

public class TestJdbc {

  public static void main(String[] args) {
    final int threads = 512;
    ExecutorService executor = Executors.newFixedThreadPool(threads);
    watch = Stopwatch.createStarted();
    for (int i = 0; i < threads; i++) {
      executor.submit(new InsertOrUpdateMutationRunnable());
    }
  }

  static final AtomicLong rowCount = new AtomicLong();
  static Stopwatch watch;

  static final class InsertOrUpdateMutationRunnable implements Runnable {
    @Override
    public void run() {
      try (Connection connection =
          DriverManager.getConnection(
              "jdbc:cloudspanner:/projects/my-project/instances/my-instance/databases/my-db")) {
        CloudSpannerJdbcConnection csConnection = connection.unwrap(CloudSpannerJdbcConnection.class);
        CloudSpannerJdbcConnection csConnection =
            connection.unwrap(CloudSpannerJdbcConnection.class);
        while (true) {
          ImmutableList.Builder<Mutation> builder = ImmutableList.builder();
          for (int i = 0; i < 150; i++) {
            builder.add(
                Mutation.newInsertOrUpdateBuilder("Test")
                    .set("Id")
                    .to(rnd.nextLong())
                    .set("Col1")
                    .to(randomString(100))
                    .set("Col2")
                    .to(randomString(100))
                    .build());
            rowCount.incrementAndGet();
          }
          csConnection.write(builder.build());
          System.out.println("Rows inserted: " + rowCount);
          System.out.println("Rows/second: " + rowCount.get() / watch.elapsed(TimeUnit.SECONDS));
        }
        }
      } catch (SQLException e) {
        throw new RuntimeException(e);
      }
    }

    private final Random rnd = new Random();

    private String randomString(int maxLength) {
      byte[] bytes = new byte[rnd.nextInt(maxLength / 2) + 1];
      rnd.nextBytes(bytes);
      return Base64.encodeBase64String(bytes);
    }
  }
}

The above simple example gives me a throughput of approx 35,000 rows/second without any further tuning.上面的简单示例为我提供了大约 35,000 行/秒的吞吐量，无需进一步调整。

ADDITIONAL INFORMATION 2020-08-21 : The reason that mutation objects are more efficient than (batch) DML statements, is that DML statements are internally converted to read queries by Cloud Spanner, which are then used to create mutations.附加信息 2020-08-21 ：突变对象比（批量）DML 语句更有效的原因是，DML 语句在内部被 Cloud Spanner 转换为读取查询，然后用于创建突变。 This conversion needs to be done for every DML statement in a batch, which means that a DML batch with 1,500 simple insert statements will trigger 1,500 (small) read queries and need to be converted to 1,500 mutations.这种转换需要对批处理中的每个 DML 语句进行，这意味着具有 1,500 个简单插入语句的 DML 批处理将触发 1,500 个（小）读取查询，需要转换为 1,500 个突变。 This is most probably also the reason behind the read latency that you are seeing in your monitoring.这很可能也是您在监控中看到的读取延迟背后的原因。

Would you otherwise mind sharing some more information on what your client application looks like and how many instances of it you are running?您是否介意分享更多关于您的客户端应用程序的外观以及您正在运行的实例数量的信息？

Answer 2

With more than 800million rows to insert, and seeing that you are a Java programmer, can I suggest using Beam on Dataflow?有超过 8 亿行要插入，并且看到您是 Java 程序员，我可以建议在 Dataflow 上使用 Beam 吗？

The spanner writer in Beam is designed to be as efficient as possible with its writes - grouping rows by a similar key, and batching them as you are doing. Beam 中的 spanner writer旨在尽可能高效地进行写入 - 按相似的键对行进行分组，并按照您的操作对它们进行批处理。 Beam on Dataflow can also use several worker VMs to execute multiple file reads and spanner writes in parallel... Beam on Dataflow 还可以使用多个工作虚拟机并行执行多个文件读取和 spanner 写入...

With a multiregion spanner instance, you should be able to get approx 1800 rows per node per second insert speed (more if the rows are small and batched, as Knut's reply suggests) and with 5 spanner nodes, you can probably have between 10 and 20 importer threads running in parallel - whether using your importer program or using Dataflow.使用多区域 spanner 实例，您应该能够获得每节点每秒大约 1800 行的插入速度（如果行小且成批，则更多，正如 Knut 的回复所建议的那样），并且使用 5 个 spanner 节点，您可能有 10 到 20 个并行运行的导入器线程 - 无论是使用您的导入器程序还是使用 Dataflow。

(disclosure: I am the Beam SpannerIO maintainer) （披露：我是 Beam SpannerIO 维护者）

Answer 3

Cloud Spanner has launched a new feature that greatly improves the performance of the use case here and enables more efficient data updates. Cloud Spanner 推出了一项新功能，该功能极大地提高了此处用例的性能，并实现了更高效的数据更新。

If the batch of DML queries have the same SQL text and are parameterized, similar to PreparedStatement(s) generated by JDBC client in this post, the queries in the batch are combined to execute a single server-side action to generate rows followed by another single server-side write action.如果这批 DML 查询具有相同的 SQL 文本并被参数化，类似于本文中 JDBC 客户端生成的 PreparedStatement(s)，则将批处理中的查询组合起来执行单个服务器端操作以生成行，然后再生成行单个服务器端写入操作。 This reduces the number of server-side actions linearly by batch size leading to much improved latency and better throughput.这通过批量大小线性减少了服务器端操作的数量，从而大大改善了延迟和更好的吞吐量。

The improvement in latency ranges where better performance improvement is seen with bigger batch sizes.延迟范围的改进，其中更大的批量大小可以看到更好的性能改进。 The feature is applied automatically in Batch DML APIs.该功能在 Batch DML API 中自动应用。

Official documentation of this performance optimization can be found here: https://cloud.google.com/spanner/docs/dml-best-practices#batch-dml这种性能优化的官方文档可以在这里找到： https://cloud.google.com/spanner/docs/dml-best-practices#batch-dml

使用 jdbc 将行批量插入 Spanner 时加载性能低

问题描述

3 个解决方案

解决方案1
0 已采纳 2020-08-19 08:50:58

解决方案2
0 2020-08-20 16:30:33

解决方案3
0 2022-08-16 20:55:01

使用 jdbc 将行批量插入 Spanner 时加载性能低

问题描述

3 个解决方案

解决方案1 0 已采纳 2020-08-19 08:50:58

解决方案2 0 2020-08-20 16:30:33

解决方案3 0 2022-08-16 20:55:01

解决方案1
0 已采纳 2020-08-19 08:50:58

解决方案2
0 2020-08-20 16:30:33

解决方案3
0 2022-08-16 20:55:01