如何使用 Datastax 批量加载程序（Ubuntu）将数据加载到 Apache Cassandra？

Question

When I want to upload data to my "Test Cluster" into Apache Cassandra I open the terminal and then:当我想将数据上传到我的“测试集群”到 Apache Cassandra 中时，我打开终端，然后：

export PATH=/home/mypc/dsbulk-1.7.0/bin:$PATH

source ~/.bashrc

dsbulk load -url /home/mypc/Desktop/test/file.csv -k keyspace_test -t table_test

But...但...

At least 1 record does not match the provided schema.mapping or schema.query. Please check that the connector configuration and the schema configuration are correct.
Operation LOAD_20201105-103000-577734 aborted: Too many errors, the maximum allowed is 100.

total | failed | rows/s | p50ms | p99ms | p999ms | batches
  104 |    104 |      0 |  0,00 |  0,00 |   0,00 |    0,00

Rejected records can be found in the following file(s): mapping.bad
Errors are detailed in the following file(s): mapping-errors.log
Last processed positions can be found in positions.txt

What does it means?这是什么意思？ Why I can't load?为什么我无法加载？

Thank you!谢谢！

Answer 1

The error is that you're not providing the mapping between CSV data & table.错误是您没有提供 CSV 数据和表之间的映射。 It could be done 2 ways:可以通过两种方式完成：

If CSV file has header with column names matching to the column names in Cassandra, then use -header true如果 CSV 文件的标题与 Cassandra 中的列名匹配，则使用-header true
Provide mapping explicitly using the -m option (see docs ) - you need to map CSV columns into Cassandra columns.使用-m选项显式提供映射（请参阅文档） - 您需要将 CSV 列映射到 Cassandra 列。

There is a very good series of the blog posts about different aspects of DSBulk usage:关于 DSBulk 使用的不同方面，有一系列非常好的博客文章：

the first two of them covers data loading in great details其中前两个详细介绍了数据加载

Answer 2

It means that the columns in the CSV input file does not match the columns in your table_test table.这意味着 CSV 输入文件中的列与table_test表中的列不匹配。 You can get the details of the schema mismatch in the mapping-errors.log so you know which column(s) are problematic.您可以在mapping-errors.log中获取架构不匹配的详细信息，以便了解哪些列存在问题。

Since the CSV columns don't match the table schema, you will need to manually map them by specifying the --schema.mapping flag.由于 CSV 列与表模式不匹配，因此您需要通过指定--schema.mapping标志手动映射它们。

For details, see the DSBulk Common options page.有关详细信息，请参阅DSBulk 常用选项页面。 You can also have a look at schema mapping examples inthis blog post .您还可以查看此博客文章中的模式映射示例。 Cheers!干杯!

如何使用 Datastax 批量加载程序（Ubuntu）将数据加载到 Apache Cassandra？

问题描述

2 个解决方案

解决方案1
3 2020-11-05 11:37:14

解决方案2
2 2020-11-05 11:41:05

如何使用 Datastax 批量加载程序（Ubuntu）将数据加载到 Apache Cassandra？

问题描述

2 个解决方案

解决方案1 3 2020-11-05 11:37:14

解决方案2 2 2020-11-05 11:41:05

解决方案1
3 2020-11-05 11:37:14

解决方案2
2 2020-11-05 11:41:05