繁体   English   中英

从 Azure 数据湖中读取和查询 Parquet 文件

[英]Read and Query Parquet files from Azure Data Lake Using Azure Function without downloading locally C#

我们需要读取 Azure 数据湖中可用的所有 parquet 文件并转储到 SQL 数据库中。 但是由于一些业务规则和限制我的数据,我想过滤数据集而不实际将文件下载到我的本地。 是否有任何此类 nuget package 或可用于带有任何示例代码的点网的库? 有什么建议么?

这是 java 中可用的成功解决方法

 StorageCredentials credentials = new StorageCredentialsAccountAndKey(accountName, accountKey);
 CloudStorageAccount connection = new CloudStorageAccount(credentials, true);
 CloudBlobClient blobClient = connection.createCloudBlobClient();
 CloudBlobContainer container = blobClient.getContainerReference(containerName);

 CloudBlob blob = container.getBlockBlobReference(fileName);

 Configuration config = new Configuration();
 config.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem");
 config.set("fs.azure.sas.<containerName>.<accountName>.blob.core.windows.net", token);
 URI uri = new URI("wasbs://<containerName>@<accountName>.blob.core.windows.net/" + blob.getName());
 InputFile file = HadoopInputFile.fromPath(new org.apache.hadoop.fs.Path(uri),
                config);
 ParquetReader<GenericRecord> reader = AvroParquetReader.<GenericRecord> builder(file).build();

 GenericRecord record;
 while ((record = reader.read()) != null) {
     System.out.println(record);
 }
 reader.close();

这是您可以尝试的 C# 中的解决方法之一

var connectionString = String.Format("<YOUR CONNECTION STRING>");

var storageAccount = CloudStorageAccount.Parse(connectionString);
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
CloudBlobContainer container = blobClient.GetContainerReference("<YOUR CONTAINER NAME>");

SharedAccessBlobPolicy sasConstraints = new SharedAccessBlobPolicy();
sasConstraints.SharedAccessExpiryTime = DateTime.UtcNow.AddDays(2);
sasConstraints.Permissions = SharedAccessBlobPermissions.Read | SharedAccessBlobPermissions.Write | SharedAccessBlobPermissions.List;

CloudBlockBlob blob = container.GetBlockBlobReference("<YOUR PARQUET FILE>");
var blobUrlWithSAS = blob.Uri + blob.GetSharedAccessSignature(sasConstraints);

var client = new HttpClient();
var stream = await client.GetStreamAsync(blobUrlWithSAS);

ParquetReader parquetReader = new ParquetReader();

var options = new ParquetOptions {
    TreatByteArrayAsString = true
};
var reader = new ParquetReader(stream, options);

参考:

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM