import os, uuid
from azure.data.tables import TableClient
import json
from azure.cosmosdb.table.tableservice import TableService
from azure.cosmosdb.table.models import Entity, EntityProperty
import pandas as pd
def queryAzureTable(azureTableName,filterQuery):
table_service = TableService(account_name='accountname', account_key='accountkey')
tasks=Entity()
tasks = table_service.query_entities(azureTableName, filter=filterQuery)
return tasks
filterQuery = f"PartitionKey eq '{key}' and Timestamp ge datetime'2022-06-15T09:00:00' and Timestamp lt datetime'2022-06-15T10:00:00')"
entities = queryAzureTable("TableName",filterQuery)
for i in entities:
print(i)
OR
df = pd.DataFrame(entities)
Above is the code that I am using, in the azure table there are only around 1000 entries which should not take too long but extracting it takes more than an hour using this.
Both, using either a 'for' loop or changing entities directly to DataFrame takes too long.
Could anyone let me know the reason why it is taking too long or generally it takes that much of time. If that's the case, is there any alternate way of it that does not take more than 10-15 mins for processing it without increasing the number of clusters already in use.
I read multithreading might resolve it, I tried that too but doesn't seems to be of any help, maybe I am writing it wrong, could anyone help me with the code using multithreading or any alternate way.
I tried to list all the rows with my table storage, By default Azure table storage can only have 1000 rows or entities per table.
Also, there are few limitations on the partition key and rows, which should not exceed the size of 1KIB. Unfortunately the type of table storage account also matters to decrease the latency of your output. As, you're trying to query 1000 rows at once:-
Also, AFAIK, In your code, you can directly make use of
list_entities
method to list all the entities in the table instead of writing such complex query:-
I tried the below code and was able to retrieve all the table entities successfully within few seconds with standard general purpose V2 Storage account:-
Code:-
from azure.data.tables import TableClient
table_client = TableClient.from_connection_string(conn_str="DefaultEndpointsProtocol=<connection-strin g>windows.net", table_name="myTable")
**# Query the entities in the table**
entities = list(table_client.list_entities())
for i, entity in enumerate(entities):
print("Entity #{}: {}".format(entity, i))
Result:-
With Pandas:-
import pandas as pd
from azure.cosmosdb.table.tableservice import TableService
CONNECTION_STRING = "DefaultEndpointsProtocol=https;AccountName=siliconstrg45;AccountKey=<connection-string>==;EndpointSuffix=core.windows.net"
SOURCE_TABLE = "myTable"
def set_table_service():
""" Set the Azure Table Storage service """
return TableService(connection_string=CONNECTION_STRING)
def get_dataframe_from_table_storage_table(table_service):
""" Create a dataframe from table storage data """
return pd.DataFrame(get_data_from_table_storage_table(table_service,
))
def get_data_from_table_storage_table(table_service):
""" Retrieve data from Table Storage """
get_dataframe_from_table_storage_table
for record in table_service.query_entities(
SOURCE_TABLE
):
yield record
ts = set_table_service()
df = get_dataframe_from_table_storage_table(table_service=ts,
)
print(df)
Result:-
If you have your table storage scalability targets in place, You can consider few points from this document to increase the I/Ops of your table storage:-
Refer here:-
https://learn.microsoft.com/en-us/azure/storage/tables/storage-performance-checklist
Also, storage quota and limits vary for Azure subscriptions type too!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.