简体   繁体   English

如何使用Cassandra API将Python的cosmos_client连接到Cosmos数据库实例?

[英]How to connect Python's cosmos_client to Cosmos DB instance using Cassandra API?

I have a Cosmos DB (Cassandra API) instance set up and I'd like to manage it's throughput from a Python application. 我设置了一个Cosmos DB(Cassandra API)实例,我想通过Python应用程序管理它的吞吐量。 I'm able to create a azure.cosmos.cosmos_client using the cassandra endpoint and primary password listed in Azure without errors, but all attempted interactions with the client result in "azure.cosmos.errors.HTTPFailure: Status code: 404". 我可以使用Azure中列出的cassandra终结点和主密码创建azure.cosmos.cosmos_client,而不会出现错误,但是所有与该客户端的尝试交互都会导致“ azure.cosmos.errors.HTTPFailure:状态码:404”。

I am already successfully interacting with this database through cassandra-driver in Python, but I'd like access to the cosmos-client to manage throughput provisioning via code. 我已经通过Python中的cassandra驱动程序成功地与此数据库进行了交互,但是我想访问cosmos-client来通过代码管理吞吐量配置。 I want to autoscale throughput as database use fluctuates between high levels of utilization and almost no activity. 我想自动调整吞吐量,因为数据库使用率在高利用率水平和几乎没有活动之间波动。

Creating a cosmos_client requires a valid URI, with schema (https/http/ftp etc...) included. 创建cosmos_client需要一个有效的URI,其中包含架构(https / http / ftp等...)。 The endpoint listed on azure which was successfully used to connect via cqlsh as well as the Python cassandra-driver did not specify schema. 在azure上列出的已成功用于通过cqlsh进行连接的端点以及Python cassandra-driver未指定架构。 I added "https://" to the beginning of the provided endpoint and was able to create the client in Python ("http://" results in errors, also verified incorrect addresses also result in errors even with "https://"). 我在提供的端点的开头添加了“ https://”,并且能够在Python中创建客户端(“ http://”会导致错误,即使使用“ https://,经过验证的不正确地址也会导致错误。 ”)。 Now that I have a client object created, any interaction I attempt with it gives me 404 errors. 现在,我已经创建了一个客户端对象,我尝试与之进行的任何交互都会给我404错误。

client = cosmos_client.CosmosClient(f'https://{COSMOS_CASSANDRA_ENDPOINT}', {'masterKey': COSMOS_CASSANDRA_PASSWORD} )

client.ReadEndpoint
        #'https://COSMOS_CASSANDRA_ENDPOINT'

client.GetDatabaseAccount(COSMOS_CASSANDRA_ENDPOINT)
        #azure.cosmos.errors.HTTPFailure: Status code: 404

client.ReadDatabase(EXISTING_KEYSPACE_NAME)
        #azure.cosmos.errors.HTTPFailure: Status code: 404

I'm wondering if using the cosmos_client is the correct way to interact with the Cosmos Cassandra instance to modify throughput from my Python application. 我想知道使用cosmos_client是否是与Cosmos Cassandra实例进行交互以修改Python应用程序的吞吐量的正确方法。 If so, how should I set up the cosmos_client properly? 如果是这样,我应该如何正确设置cosmos_client? Perhaps there is a way to do this directly through database modifications using cassandra-driver. 也许有一种方法可以直接使用cassandra-driver通过数据库修改来做到这一点。

I could never get this to work after toiling for a while with trying and failing to access the database via CosmosClient or DocumentClient in Python and .NET. 尝试并无法通过Python和.NET中的CosmosClient或DocumentClient访问数据库辛苦了一段时间之后,我再也无法使它正常工作。 Ultimately I found 2 methods that are each unfortunately a bit hacky and present some challenges that seem unnecessary. 最终,我发现了2种方法,每一种方法都有些笨拙,并提出了一些不必要的挑战。

What I ended up doing was accomplishing this via a subprocess calling to the Azure CLI to change throughput. 我最终要做的是通过调用Azure CLI的子过程来完成此操作,以更改吞吐量。 This is the command that is executed: 这是执行的命令:

f'az cosmosdb cassandra table throughput update --account-name {__cosmos_instance_name} --keyspace-name {__cassandra_keyspace} --name {table_name} --resource-group {__cosmos_resource_group} --throughput {new_throughput}'

What is very unfortunate about both methods that I found to work is that this doesn't work when the target database is being throttled due to rate limiting. 我发现这两种方法都起作用的非常不幸的是,当由于速率限制而限制目标数据库时,这将无法正常工作。 This meant we also had to implement some logic to throttle our own service's interactions with the database before calling the code to perform scaling. 这意味着我们还必须实现一些逻辑,以限制我们自己的服务与数据库的交互,然后再调用代码执行扩展。

Some other notes about our solution: The service is hosted in kubernetes, so we had the metric evaluation and scaling execution added to the lifecycle hooks on the pod. 有关我们解决方案的其他一些注意事项:该服务托管在kubernetes中,因此我们将指标评估和扩展执行添加到了Pod的生命周期挂钩中。 The auto-scaler is also also executed when we encounter suspected rate limiting during cassandra interactions when handling cassandra.cluster.NoHostAvailable exceptions. 当我们在处理cassandra.cluster.NoHostAvailable异常时,在cassandra交互过程中遇到可疑的速率限制时,也会执行自动定标器。

... ...

The other way I could set the provisioned throughput from code was via executing cql directly through cassandra-driver by doing the following (in Python): 我可以通过代码设置预配置吞吐量的另一种方法是通过直接执行以下操作(在Python中)通过cassandra-driver执行cql:

from cassandra.cqlengine import connection

connection.setup(<CONNECTION_SETUP_ARGS>)
session = connection.get_session()
session.execute("use <CASSANDRA_NAMESPACE>")
session.execute("alter table <CASSANDRA_TABLE_NAME> with cosmosdb_provisioned_throughput=<DESIRED_THROUGHPUT>")

When I get a chance I'll switch to this approach since it doesn't require Azure CLI installation and subprocess calls. 如果有机会,我将切换到这种方法,因为它不需要Azure CLI安装和子进程调用。

I think I got this idea originally from here . 我想我最初是从这里得到这个主意的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM