繁体   English   中英

在Azure Data Lake中解压缩.gz文件

[英]Decompress .gz file in Azure Data lake

如何使用c#asp.net解压缩和读取Azure数据湖中的.gz文件

我试过下面的代码,但会导致异常。

例外:找不到路径'D:\\ xxxxxx \\ filename'的一部分。

public static void Main(string[] args)
    {
        // Obtain AAD token
        var creds = new ClientCredential(applicationId, clientSecret);
        var clientCreds = ApplicationTokenProvider.LoginSilentAsync(tenantId, creds).GetAwaiter().GetResult();

        // Create ADLS client object
        AdlsClient client = AdlsClient.CreateClient(adlsAccountFQDN, clientCreds);

        try
        {
            // Enumerate directory
            foreach (var entry in client.EnumerateDirectory("/Test/"))
            {
                try
                {
                    string filename =entry.Name;
                    using (Stream fileStream = File.OpenRead(filename), zippedStream = new GZipStream(fileStream, CompressionMode.Decompress))
                    {
                        using (StreamReader reader = new StreamReader(zippedStream))
                        {

                            // work with reader
                            reader.ReadLine();

                        }
                    }
                }
                catch (Exception ex)
                {

                }
            }
        }
        catch (AdlsException e)
        {
            PrintAdlsException(e);
        }

        Console.WriteLine("Done. Press ENTER to continue ...");
        Console.ReadLine();
    }

我找到了解决方案。 代替File.OpenRead(filename)我们应该使用client.GetReadStream(entry.FullName)

代码是:

 foreach (var entry in client.EnumerateDirectory("/Test/"))
                {
                    StringBuilder lines = new StringBuilder();
                    try
                    {
                        using (Stream fileStream = client.GetReadStream(entry.FullName), zippedStream = new GZipStream(fileStream, CompressionMode.Decompress))
                        {
                            using (StreamReader reader = new StreamReader(zippedStream))
                            {
                                string line;
                                while ((line = reader.ReadLine()) != null)
                                {
                                    lines.AppendLine(line);
                                    Console.WriteLine(lines);
                                }
                            }
                        }
                    }

内置的提取器 (Text,Csv,Tsv)现在本机支持gzip压缩文件,因此您除了读取它们之外无需执行任何其他特殊操作:

@data =
    EXTRACT Timestamp DateTime,
            Event string,
            Value int
    FROM "/input/input.csv.gz"
    USING Extractors.Csv();

这也适用于自定义提取器:

@data =
    EXTRACT Timestamp DateTime,
            Event string,
            Value int
    FROM "/input/input.csv.gz"
    USING  new USQLworking.MyExtractor();

请参阅此处以获得Michael Rys的进一步说明。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM