简体   繁体   中英

Does fetching data from Azure Table Storage from python takes too long? Data has around 1000 rows per hour and I am fetching it per hour

import os, uuid
from azure.data.tables import TableClient
import json
from azure.cosmosdb.table.tableservice import TableService
from azure.cosmosdb.table.models import Entity, EntityProperty
import pandas as pd


def queryAzureTable(azureTableName,filterQuery):
  table_service = TableService(account_name='accountname', account_key='accountkey')
  tasks=Entity()
  tasks = table_service.query_entities(azureTableName, filter=filterQuery)
  return tasks
filterQuery = f"PartitionKey eq '{key}' and Timestamp ge datetime'2022-06-15T09:00:00' and Timestamp lt datetime'2022-06-15T10:00:00')"
entities = queryAzureTable("TableName",filterQuery)

for i in entities:
  print(i)

OR

df = pd.DataFrame(entities)

Above is the code that I am using, in the azure table there are only around 1000 entries which should not take too long but extracting it takes more than an hour using this.

Both, using either a 'for' loop or changing entities directly to DataFrame takes too long.

Could anyone let me know the reason why it is taking too long or generally it takes that much of time. If that's the case, is there any alternate way of it that does not take more than 10-15 mins for processing it without increasing the number of clusters already in use.

I read multithreading might resolve it, I tried that too but doesn't seems to be of any help, maybe I am writing it wrong, could anyone help me with the code using multithreading or any alternate way.

I tried to list all the rows with my table storage, By default Azure table storage can only have 1000 rows or entities per table.

Also, there are few limitations on the partition key and rows, which should not exceed the size of 1KIB. Unfortunately the type of table storage account also matters to decrease the latency of your output. As, you're trying to query 1000 rows at once:-

  1. Make sure you have your table storage near to your region.
  2. Check the scalability targets and limitations for your rows of Azure table storage here:- https://learn.microsoft.com/en-us/azure/storage/tables/scalability-targets#scale-targets-for-table-storage

Also, AFAIK, In your code, you can directly make use of

list_entities

method to list all the entities in the table instead of writing such complex query:-

I tried the below code and was able to retrieve all the table entities successfully within few seconds with standard general purpose V2 Storage account:-

Code:-

from  azure.data.tables  import  TableClient
table_client = TableClient.from_connection_string(conn_str="DefaultEndpointsProtocol=<connection-strin g>windows.net", table_name="myTable")
**# Query the entities in the table**
entities = list(table_client.list_entities())
for  i, entity  in  enumerate(entities):
    print("Entity #{}: {}".format(entity, i))

Result:-

在此处输入图像描述

With Pandas:-

    import  pandas  as  pd
    
    from azure.cosmosdb.table.tableservice import TableService
    
      
    
    CONNECTION_STRING = "DefaultEndpointsProtocol=https;AccountName=siliconstrg45;AccountKey=<connection-string>==;EndpointSuffix=core.windows.net"
    
    SOURCE_TABLE = "myTable"
    
      
    
    def  set_table_service():
    
    """ Set the Azure Table Storage service """
    
    return TableService(connection_string=CONNECTION_STRING)
    
      
    
    def  get_dataframe_from_table_storage_table(table_service):
    
    """ Create a dataframe from table storage data """
    
    return  pd.DataFrame(get_data_from_table_storage_table(table_service,
    
    ))
    
      
    
    def  get_data_from_table_storage_table(table_service):
    
    """ Retrieve data from Table Storage """
    
    get_dataframe_from_table_storage_table
    
    for  record  in  table_service.query_entities(
    
    SOURCE_TABLE
    
    ):
    
    yield  record
    
      
      
    
    ts = set_table_service()
    
    df = get_dataframe_from_table_storage_table(table_service=ts,
    
    )
    
    print(df)

Result:-

在此处输入图像描述

If you have your table storage scalability targets in place, You can consider few points from this document to increase the I/Ops of your table storage:-
Refer here:-
https://learn.microsoft.com/en-us/azure/storage/tables/storage-performance-checklist

Also, storage quota and limits vary for Azure subscriptions type too!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM