简体   繁体   中英

How to do line count of a file in Azure Data Lake Gen2

Question : On my local machine, I can get line counts of a data file by using following python code. How can we do the same when the file is stored in a container, say, myContainer in an Azure Data Lake Gen2 storage?

with open('PPPLoanHoldStatus_AprilData.txt', 'r') as fp:
    for count, line in enumerate(fp):
        pass
print('Total Lines', count + 1)

Remark : When I use the following code in a notebook in an Azure Databricks , I get the error shown below:

with open('abfss://myContainer@myAzureDLGen2.dfs.core.windows.net/MyDataFile.txt', 'r') as fp:
    for count, line in enumerate(fp):
        pass
print('Total Lines', count + 1)

ERROR :

No such file or directory: 'abfss://myContainer@myAzureDLGen2.dfs.core.windows.net/MyDataFile.txt'

If you want to do it without mounting, you can try Azure Data Lake credential passthrough .

To do this you require Azure Databricks workspace with premium plan.

Step-1: Receive log identities by running Set-AzStorageServiceLoggingProperty command in ADLS.

Step-2: This step can be done in two ways. One is Enabling ADLS credentials passthrough for a High Concurrency cluster and the other one is Enabling for a Standard cluster .

High Concurrency cluster:

在此处输入图像描述

在此处输入图像描述

Standard Cluster:

Select cluster mode as Standard and enable the passthrough and give the user access from the drop down like below.

在此处输入图像描述

You can do it in any way you want.

After creating the cluster, you can access the ADLS Gen2 files from the notebook of this cluster with the abfss:// path.

NOTE:
Please make sure you have Storage Blob Data Contributor role for the ADLS and prefer creating a new cluster and avoid the clusters which are setup with the ADLS credentials before.

Reference:

https://learn.microsoft.com/en-us/azure/databricks/security/credential-passthrough/adls-passthrough

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM