简体   繁体   English

如何使用Python将批量数据有效地插入Cassandra?

[英]How to efficiently insert bulk data into Cassandra using Python?

I have a Python application, built with Flask, that allows importing of many data records (anywhere from 10k-250k+ records at one time). 我有一个使用Flask构建的Python应用程序,该应用程序允许导入许多数据记录(一次记录10k-250k +记录中的任何位置)。 Right now it inserts into a Cassandra database, by inserting one record at a time like this: 现在,它通过一次插入一个记录,将其插入Cassandra数据库中:

for transaction in transactions:
    self.transaction_table.insert_record(transaction)

This process is incredibly slow. 这个过程非常慢。 Is there a best-practice approach I could use to more efficiently insert this bulk data? 我是否可以使用一种最佳实践方法来更有效地插入此批量数据?

You can use batch statements for this, an example and documentation is available from the datastax documentation . 您可以为此使用批处理语句,可以从datastax文档中找到示例和文档 You can also use some child workers and/or async queries on top of this. 您还可以在此之上使用一些童工和/或异步查询。

In terms of best practices, it is more efficient if each batch only contains one partition key . 在最佳实践方面,如果每个批次仅包含一个分区键 ,则效率更高。 This is because you do not want a node to be used as a coordinator for many different partition keys, it would be faster to contact each individual node directly. 这是因为您不希望将节点用作许多不同分区键的协调器,因此直接联系每个单个节点会更快。

If each record has a different partition key, a single prepared statement with some child workers may work out to be better. 如果每个记录都具有不同的分区键,则可以使用带有一些童工的单个准备好的语句来达到更好的效果。

You may also want to consider using a TokenAware load balancing policy allowing the relevant node to be contacted directly, instead of being coordinated through another node. 您可能还需要考虑使用TokenAware负载平衡策略 ,该策略允许直接联系相关节点,而不是通过另一个节点进行协调。

The easiest solution is to generate csv files from your data, and import it with the COPY command. 最简单的解决方案是从您的数据生成csv文件,然后使用COPY命令将其导入。 That should work well for up to a few million rows. 对于多达几百万行,这应该很好。 For more complicated scenarios you could use the sstableloader command. 对于更复杂的方案,可以使用sstableloader命令。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM