简体   繁体   中英

Read and Query Parquet files from Azure Data Lake Using Azure Function without downloading locally C#

We have a requirement to read all the parquet files available in Azure Data Lake and dump in SQL database. But instead due to some business rules & limit my data, I want to filter the dataset without actually downloading the files to my local. Is there any such nuget package or library available for dot net with any sample code? Any suggestions?

Here is a successful workaround available in java

 StorageCredentials credentials = new StorageCredentialsAccountAndKey(accountName, accountKey);
 CloudStorageAccount connection = new CloudStorageAccount(credentials, true);
 CloudBlobClient blobClient = connection.createCloudBlobClient();
 CloudBlobContainer container = blobClient.getContainerReference(containerName);

 CloudBlob blob = container.getBlockBlobReference(fileName);

 Configuration config = new Configuration();
 config.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem");
 config.set("fs.azure.sas.<containerName>.<accountName>.blob.core.windows.net", token);
 URI uri = new URI("wasbs://<containerName>@<accountName>.blob.core.windows.net/" + blob.getName());
 InputFile file = HadoopInputFile.fromPath(new org.apache.hadoop.fs.Path(uri),
                config);
 ParquetReader<GenericRecord> reader = AvroParquetReader.<GenericRecord> builder(file).build();

 GenericRecord record;
 while ((record = reader.read()) != null) {
     System.out.println(record);
 }
 reader.close();

Here is one of the work around in C# that you can try

var connectionString = String.Format("<YOUR CONNECTION STRING>");

var storageAccount = CloudStorageAccount.Parse(connectionString);
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("<YOUR CONTAINER NAME>");

SharedAccessBlobPolicy sasConstraints = new SharedAccessBlobPolicy();
sasConstraints.SharedAccessExpiryTime = DateTime.UtcNow.AddDays(2);
sasConstraints.Permissions = SharedAccessBlobPermissions.Read | SharedAccessBlobPermissions.Write | SharedAccessBlobPermissions.List;

CloudBlockBlob blob = container.GetBlockBlobReference("<YOUR PARQUET FILE>");
var blobUrlWithSAS = blob.Uri + blob.GetSharedAccessSignature(sasConstraints);

var client = new HttpClient();
var stream = await client.GetStreamAsync(blobUrlWithSAS);

ParquetReader parquetReader = new ParquetReader();

var options = new ParquetOptions {
    TreatByteArrayAsString = true
};
var reader = new ParquetReader(stream, options);

REFERENCES:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM