简体   繁体   English

Azure python sdk 存储表备份表返回1000多行

[英]Azure python sdk storage table backup table Return more than 1000 rows

Hope I can get some hints about this issue.希望我能得到一些关于这个问题的提示。

I have written the following code, to make a table backup from one storage to another.我编写了以下代码,将表从一个存储备份到另一个。

query_size = 100

#save data to storage2 and check if there is lefted data in current table,if yes recurrence
def queryAndSaveAllDataBySize(tb_name,resp_data:ListGenerator ,table_out:TableService,table_in:TableService,query_size:int):
    for item in resp_data:
        #remove etag and Timestamp appended by table service
        del item.etag
        del item.Timestamp
        print("instet data:" + str(item) + "into table:"+ tb_name)
        table_in.insert_or_replace_entity(tb_name,item)
    if resp_data.next_marker:
        data = table_out.query_entities(table_name=tb_name,num_results=query_size,marker=resp_data.next_marker)
        queryAndSaveAllDataBySize(tb_name,data,table_out,table_in,query_size)


tbs_out = table_service_out.list_tables()
print(tbs_out)

for tb in tbs_out:
    table = tb.name + today
    print(target_connection_string)
    #create table with same name in storage2
    table_service_in.create_table(table_name=table, fail_on_exist=False)
    #first query
    data = table_service_out.query_entities(tb.name,num_results=query_size)
    queryAndSaveAllDataBySize(table,data,table_service_out,table_service_in,query_size)

This should be a simple script to loop over items in a table and copy them into another storage account.这应该是一个简单的脚本,用于遍历表中的项目并将它们复制到另一个存储帐户。 I have this exact code already running in an azure function and it is working just fine.我已经在 azure function 中运行了这个确切的代码,它工作得很好。

Today I tried to run it against several storage accounts, for a bit runs just fine, but then it stops and throws this error:今天我尝试针对多个存储帐户运行它,运行得很好,但随后它停止并抛出此错误:

Traceback (most recent call last):
  File "/Users/users/Desktop/AzCopy/blob.py", line 205, in <module>
    queryAndSaveAllDataBySize(table,data,table_service_out,table_service_in,query_size)
  File "/Users/users/Desktop/AzCopy/blob.py", line 191, in queryAndSaveAllDataBySize
    data = table_out.query_entities(table_name=tb_name,num_results=query_size,marker=resp_data.next_marker)
  File "/Users/users/miniforge3/lib/python3.9/site-packages/azure/cosmosdb/table/tableservice.py", line 738, in query_entities
    resp = self._query_entities(*args, **kwargs)
  File "/Users/users/miniforge3/lib/python3.9/site-packages/azure/cosmosdb/table/tableservice.py", line 801, in _query_entities
    return self._perform_request(request, _convert_json_response_to_entities,
  File "/Users/users/miniforge3/lib/python3.9/site-packages/azure/cosmosdb/table/tableservice.py", line 1106, in _perform_request
    return super(TableService, self)._perform_request(request, parser, parser_args, operation_context)
  File "/Users/users/miniforge3/lib/python3.9/site-packages/azure/cosmosdb/table/common/storageclient.py", line 430, in _perform_request
    raise ex
  File "/Users/users/miniforge3/lib/python3.9/site-packages/azure/cosmosdb/table/common/storageclient.py", line 358, in _perform_request
    raise ex
  File "/Users/users/miniforge3/lib/python3.9/site-packages/azure/cosmosdb/table/common/storageclient.py", line 343, in _perform_request
    _http_error_handler(
  File "/Users/users/miniforge3/lib/python3.9/site-packages/azure/cosmosdb/table/common/_error.py", line 115, in _http_error_handler
    raise ex
azure.common.AzureMissingResourceHttpError: Not Found
{"odata.error":{"code":"TableNotFound","message":{"lang":"en-US","value":"The table specified does not exist.\nRequestId:bbdb\nTime:2021-09-29T16:42:17.6078186Z"}}}

Which I do not understand exactly why is this happening.我不明白为什么会这样。 Because all it has to do is copy from one side to another.因为它所要做的就是从一侧复制到另一侧。

Please if anyone can help to fix this, I am totally burnout and can't think anymore:(请如果有人可以帮助解决这个问题,我完全精疲力竭,不能再想了:(

UPDATE: Reading again my code I figured I have this limitation here.更新:再次阅读我的代码,我认为我在这里有这个限制。

#query 100 items per request, in case of consuming too much menory load all data in one time
query_size = 100

When I check my storage table, in fact I have only 100 rows.当我检查我的存储表时,实际上我只有 100 行。 But I couldn't find anywhere how I can set the query size to load all the data in one time.但是我在任何地方都找不到如何设置查询大小以一次性加载所有数据。

As far as I can understand, is that after I reach the query_size limit, I need to look for the next x_ms_continuation token to get the next batch.据我所知,在我达到 query_size 限制后,我需要寻找下一个x_ms_continuation令牌来获取下一批。

I have this code right now:我现在有这段代码:

query_size = 100

#save data to storage2 and check if there is lefted data in current table,if yes recurrence
def queryAndSaveAllDataBySize(tb_name,resp_data:ListGenerator ,table_out:TableService,table_in:TableService,query_size:int):
    for item in resp_data:
        #remove etag and Timestamp appended by table service
        del item.etag
        del item.Timestamp
        print("instet data:" + str(item) + "into table:"+ tb_name)
        table_in.insert_or_replace_entity(tb_name,item)
    if resp_data.next_marker:
        data = table_out.query_entities(table_name=tb_name,num_results=query_size,marker=resp_data.next_marker)
        queryAndSaveAllDataBySize(tb_name,data,table_out,table_in,query_size)


tbs_out = table_service_out.list_tables()
print(tbs_out)

for tb in tbs_out:
    table = tb.name + today
    print(target_connection_string)
    #create table with same name in storage2
    table_service_in.create_table(table_name=table, fail_on_exist=False)

    #first query
    data = table_service_out.query_entities(tb.name,num_results=query_size)
    queryAndSaveAllDataBySize(table,data,table_service_out,table_service_in,query_size)
    

According to Microsoft documentation, the marker should check if there is any continuation token, and if its true, it should rerun the code.根据 Microsoft 文档, marker器应检查是否存在任何延续标记,如果为真,则应重新运行代码。 But this is not happening in my case, once I reach the query_size the code throws the error.但这在我的情况下没有发生,一旦我达到query_size代码就会抛出错误。

Anyone can help please?有人可以帮忙吗?

Try to replace the for block with below which is to create the table with the same name in storage 2:尝试将for 块替换为以下内容,即在存储 2 中创建具有相同名称的表:

for tb in tbs_out:
    #create table with same name in storage2
    table_service_in.create_table(tb.name)
    #first query 
    data = table_service_out.query_entities(tb.name,num_results=query_size)
    queryAndSaveAllDataBySize(tb.name,data,table_service_out,table_service_in,query_size)

Below is the full sample code:以下是完整的示例代码:

from azure.cosmosdb.table.tableservice import TableService,ListGenerator

table_service_out = TableService(account_name='', account_key='')
table_service_in = TableService(account_name='', account_key='')

#query 100 items per request, in case of consuming too much menory load all data in one time
query_size = 100

#save data to storage2 and check if there is lefted data in current table,if yes recurrence
def queryAndSaveAllDataBySize(tb_name,resp_data:ListGenerator ,table_out:TableService,table_in:TableService,query_size:int):
    for item in resp_data:
        #remove etag and Timestamp appended by table service
        del item.etag
        del item.Timestamp
        print("instet data:" + str(item) + "into table:"+ tb_name)
        table_in.insert_entity(tb_name,item)
    if resp_data.next_marker:
        data = table_out.query_entities(table_name=tb_name,num_results=query_size,marker=resp_data.next_marker)
        queryAndSaveAllDataBySize(tb_name,data,table_out,table_in,query_size)


tbs_out = table_service_out.list_tables()

for tb in tbs_out:
    #create table with same name in storage2
    table_service_in.create_table(tb.name)
    #first query 
    data = table_service_out.query_entities(tb.name,num_results=query_size)
    queryAndSaveAllDataBySize(tb.name,data,table_service_out,table_service_in,query_size)

Till here this should work properly, if you still have issue with query_size, get the whole data of the table, get the list of 100 records as below: Instead of setting the query_size = 100, we can follow in the below way which will give us the 100 records:到这里这应该可以正常工作,如果您仍然对 query_size 有疑问,获取表的全部数据,如下所示获取 100 条记录的列表: 我们可以按照下面的方式而不是设置 query_size = 100,这将给出我们的 100 条记录:

tasks = table_service.query_entities('tasktable')
lst = list(tasks)
print(lst[99])

Also check for below sample from azure-sdk-for-python还要检查来自azure-sdk-for-python的以下示例

def sample_query_entities_values(self):
    from azure.data.tables import TableClient
    from azure.core.exceptions import HttpResponseError

    print("Entities with 25 < Value < 50")
    # [START query_entities]
    with TableClient.from_connection_string(self.connection_string, self.table_name) as table_client:
        try:
            parameters = {u"lower": 25, u"upper": 50}
            name_filter = u"Value gt @lower and Value lt @upper"
            queried_entities = table_client.query_entities(
                query_filter=name_filter, select=[u"Value"], parameters=parameters
            )

            for entity_chosen in queried_entities:
                print(entity_chosen)

        except HttpResponseError as e:
            print(e.message)
    # [END query_entities]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Azure 表备份检索超过 1000 行 - Azure table backup retrieve more than 1000 rows 如何将超过 25 个项目/行写入 DynamoDB 表? - How to write more than 25 items/rows into Table for DynamoDB? 连接到 R 中的 Azure 表存储 - Connecting to Azure Table Storage in R 并发在 Azure 表存储中不起作用 - Concurrency not working in Azure table storage 如何屏蔽 Azure 表存储中的现有数据 - How to mask existing datas in Azure Table Storage 如何创建一个新表,只保留Bigquery中相同ID下超过5条数据记录的行 - How to create a new table that only keeps rows with more than 5 data records under the same id in Bigquery 如何使用 RBAC 连接到 Azure 表存储 - How to connect to Azure Table Storage using RBAC 使用 Azure Datalake 存储上传文件夹失败,使用 Azure Storage Explorer 和 python SDK 两者 - Uploading folders on Azure Datalake storage failed using Azure Storage Explorer and python SDK both Azure Data Explorer 创建物化视图时如何添加多个维表 - how to add more than one dimension table while creating a materialized view in Azure Data Explorer 带有 Azure 表存储的错误 Databricks Scala 应用程序 - Error Databricks Scala Application with Azure Table Storage
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM