简体   繁体   中英

read the text file in azure storage blob line by line using python

I need to read text files from blob storage line by line and perform some operations and get a specific line to data frame. I have tried various ways to read file line by line. Is there any way to read a text file from blob line-line and perform operations and output specific line just like readlines() while data is in local storage?

candidate_resume = 'candidateresumetext'
block_blob_service = BlockBlobService(account_name='nam', account_key='key')
generator2 = block_blob_service.list_blobs(candidate_resume)
#for blob in generator2:
   #print(blob.name)
for blob in generator2:
    blob2 = block_blob_service.get_blob_to_text(candidate_resume,blob.name)
    #print(blob2)

    #blob_url=block_blob_service.make_blob_url(candidate_resume, blob.name)
    #print(blob_url)

    #blob3 = block_blob_service.get_blob_to_stream(candidate_resume,blob.name,range)
    blob3 = blob2.split('.')
    with open(blob.name,encoding = 'utf-8') as file:
        lines = file.readlines()
        for line in blob3:      
            if any(p in years_list for p in line ):
                if any(p in months_list for p in line):    
                    print(line)

The method get_blob_to_text is the right way, and you can follow the sample code below(you can make some changes if it does not meet your need). And you cannot use with open() as file since no real file is there.

#read the content of the blob(assume it's a .txt file)
str1 = block_blob_service.get_blob_to_text(container_name,blob_name)

#split the string str1 with newline.
arr1 = str1.content.splitlines()

#read the one line each time.
for a1 in arr1:
    print(a1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM