简体   繁体   English

如何使用 Datastax 批量加载程序(Ubuntu)将数据加载到 Apache Cassandra?

[英]How to load data into Apache Cassandra with Datastax Bulk loader (Ubuntu)?

When I want to upload data to my "Test Cluster" into Apache Cassandra I open the terminal and then:当我想将数据上传到我的“测试集群”到 Apache Cassandra 中时,我打开终端,然后:

export PATH=/home/mypc/dsbulk-1.7.0/bin:$PATH

source ~/.bashrc

dsbulk load -url /home/mypc/Desktop/test/file.csv -k keyspace_test -t table_test

But...但...

At least 1 record does not match the provided schema.mapping or schema.query. Please check that the connector configuration and the schema configuration are correct.
Operation LOAD_20201105-103000-577734 aborted: Too many errors, the maximum allowed is 100.

total | failed | rows/s | p50ms | p99ms | p999ms | batches
  104 |    104 |      0 |  0,00 |  0,00 |   0,00 |    0,00

Rejected records can be found in the following file(s): mapping.bad
Errors are detailed in the following file(s): mapping-errors.log
Last processed positions can be found in positions.txt

What does it means?这是什么意思? Why I can't load?为什么我无法加载?

Thank you!谢谢!

The error is that you're not providing the mapping between CSV data & table.错误是您没有提供 CSV 数据和表之间的映射。 It could be done 2 ways:可以通过两种方式完成:

  1. If CSV file has header with column names matching to the column names in Cassandra, then use -header true如果 CSV 文件的标题与 Cassandra 中的列名匹配,则使用-header true
  2. Provide mapping explicitly using the -m option (see docs ) - you need to map CSV columns into Cassandra columns.使用-m选项显式提供映射(请参阅文档) - 您需要将 CSV 列映射到 Cassandra 列。

There is a very good series of the blog posts about different aspects of DSBulk usage:关于 DSBulk 使用的不同方面,有一系列非常好的博客文章:

the first two of them covers data loading in great details其中前两个详细介绍了数据加载

It means that the columns in the CSV input file does not match the columns in your table_test table.这意味着 CSV 输入文件中的列与table_test表中的列不匹配。 You can get the details of the schema mismatch in the mapping-errors.log so you know which column(s) are problematic.您可以在mapping-errors.log中获取架构不匹配的详细信息,以便了解哪些列存在问题。

Since the CSV columns don't match the table schema, you will need to manually map them by specifying the --schema.mapping flag.由于 CSV 列与表模式不匹配,因此您需要通过指定--schema.mapping标志手动映射它们。

For details, see the DSBulk Common options page.有关详细信息,请参阅DSBulk 常用选项页面。 You can also have a look at schema mapping examples inthis blog post .您还可以查看此博客文章中的模式映射示例。 Cheers!干杯!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM