繁体   English   中英

Amazon S3 SELECT从S3存储桶中的.csv文件返回垃圾数据(使用.NET SDK)

[英]Amazon S3 SELECT returning garbage data from a .csv file in S3 Bucket (using .NET SDK)

以下是我的AWS状态机中的两种方法。

首先,使用S3 SELECT从CSV文件获取数据的方法。

/// <summary>
/// Use S3 Select in order to obtain the data from the source and return it
/// </summary>
/// <param name="s3Object"></param>
/// <param name="s3Client"></param>
/// <param name="definition"></param>
/// <returns></returns>
private static async Task<ISelectObjectContentEventStream> GetSelectObjectContentEventStream(S3Object s3Object,
    AmazonS3Client s3Client, ObjectDefinition definition)
{
    var response = await s3Client.SelectObjectContentAsync(new SelectObjectContentRequest()
    {
        Bucket = s3Object.BucketName,
        Key = s3Object.Key,
        ExpressionType = ExpressionType.SQL,
        Expression = "select * from S3Object",
        InputSerialization = new InputSerialization()
        {
            CSV = new CSVInput()
            {
                FileHeaderInfo = FileHeaderInfo.Ignore,
                FieldDelimiter = ",",
            }
        },
        OutputSerialization = new OutputSerialization()
        {
            JSON = new JSONOutput()
        }
    });

    return response.Payload;
}

现在,调用它的方法:

public async Task<StaticDataConsumerDefinition> ConvertFromSourceS3Async(StaticDataConsumerDefinition staticDataConsumer, ILambdaContext context)
{
    using (var s3Client = new AmazonS3Client())
    {
        foreach (ObjectDefinition definition in staticDataConsumer.TargetList.Objects)
        {
            var listRequest = new ListObjectsV2Request
            {
                BucketName = definition.FilePath,
                MaxKeys = 1000
            };

            ListObjectsV2Response listResponse;
            listResponse = s3Client.ListObjectsV2Async(listRequest).Result; // Force synchronous

            if (definition.LogActivity)
            {
                context.Logger.LogLine($"Response from S3 Request: {listResponse.HttpStatusCode} ({listResponse.HttpStatusCode.ToString()})");
            }

            foreach (var entity in listResponse.S3Objects.Where(n => n.Key.Contains(definition.FilePrefix)))
            {
                if (entity.Key.Contains(definition.FileExtension))
                {
                    context.Logger.LogLine($"entity {entity.Key}");

                    using (var s3Events = await GetSelectObjectContentEventStream(entity, s3Client, definition))
                    {
                        foreach (var ev in s3Events)
                        {
                            context.Logger.LogLine($"Received {ev.GetType().Name}!");
                            if (ev is RecordsEvent records)
                            {
                                context.Logger.LogLine("The contents of the Records Event is...");
                                using (var reader = new StreamReader(records.Payload))
                                {
                                    context.Logger.Log(reader.ReadToEnd());
                                }
                            }
                        }
                    }
                }
            }
        }
    }
        context.Logger.Log($"Passing ConvertSourceData {ConvertToIndentedJson(staticDataConsumer)}");
    return staticDataConsumer;
}

但是,我从CloudWatch日志中获取的数据是垃圾-看起来有点像ASCII字符/或编码字符? 不是我所期望的! 大家有什么想法吗?

{“ _1”:“ \\ u00001 \\ u00000 \\ u00005 \\ u0000”,“ _2”:“ \\ u0000K \\ u0000a \\ u0000t \\ u0000e \\ u0000 \\ u0000F \\ u0000a \\ u0000r \\ u0000d \\ u0000e \\ u0000l \\ u0000l \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00001 \\ u00000 \\ u00006 \\ u0000”,“ _2”:“ \\ u0000S \\ u0000h \\ u0000o \\ u0000n \\ u0000a \\ u0000 \\ u0000M \\ u0000a \\ u0000r \\ u0000i \\ u0000n \\ u0000o \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00001 \\ u00000 \\ u00008 \\ u0000”,“ _2”:“ \\ u0000S \\ u0000h \\ u0000o \\ u0000n \\ u0000a \\ u0000 \\ u0000M \\ u0000a \\ u0000r \\ u0000i \\ u0000n \\ u0000o \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00001 \\ u00001 \\ u00001 \\ u0000”,“ _2”:“ \\ u0000S \\ u0000h \\ u0000o \\ u0000n \\ u0000a \\ u0000 \\ u0000M \\ u0000a \\ u0000r \\ u0000i \\ u0000n \\ u0000o \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00001 \\ u00001 \\ u00002 \\ u0000”,“ _2”:“ \\ u0000L \\ u0000i \\ u0000n \\ u0000a \\ u0000 \\ u0000H \\ u0000a \\ u0000n \\ u0000n \\ u0000a \\ u0000w \\ u0000e \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00001 \\ u00001 \\ u00003 \\ u0000”,“ _2”:“ \\ u0000J \\ u0000e \\ u0000n \\ u0000n \\ u0000i \\ u0000f \\ u0000e \\ u0000r \\ u0000 \\ u0000H \\ u0000a \\ u0000l \\ u0000e \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00001 \\ u00001 \\ u00004 \\ u0000”,“ _2”:“ \\ u0000S \\ u0000t \\ u0000a \\ u0000n \\ u0000 \\ u0000K \\ u0000a \\ u0000k \\ u0000k \\ u0000a \\ u0000s \\ u0000i \\ u0000s \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00001 \\ u00001 \\ u00006 \\ u0000”,“ _2”:“ \\ u0000S \\ u0000t \\ u0000a \\ u0000n \\ u0000 \\ u0000K \\ u0000a \\ u0000k \\ u0000k \\ u0000a \\ u0000s \\ u0000i \\ u0000s \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00001 \\ u00001 \\ u00008 \\ u0000”,“ _2”:“ \\ u0000S \\ u0000t \\ u0000a \\ u0000n \\ u0000 \\ u0000K \\ u0000a \\ u0000k \\ u0000k \\ u0000a \\ u0000s \\ u0000i \\ u0000s \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00001 \\ u00001 \\ u00009 \\ u0000”,“ _2”:“ \\ u0000S \\ u0000h \\ u0000o \\ u0000n \\ u0000a \\ u0000 \\ u0000M \\ u0000a \\ u0000r \\ u0000i \\ u0000n \\ u0000o \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00001 \\ u00002 \\ u00007 \\ u0000”,“ _2”:“ \\ u0000A \\ u0000y \\ u0000d \\ u0000i \\ u0000n \\ u0000 \\ u0000T \\ u0000e \\ u0000b \\ u0000y \\ u0000a \\ u0000n \\ u0000i \\ u0000a \\ u0000n \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00001 \\ u00002 \\ u00008 \\ u0000”,“ _2”:“ \\ u0000C \\ u0000a \\ u0000m \\ u0000e \\ u0000r \\ u0000o \\ u0000n \\ u0000 \\ u0000P \\ u0000a \\ u0000l \\ u0000m \\ u0000e \\ u0000r \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00001 \\ u00009 \\ u00007 \\ u0000”,“ _2”:“ \\ u0000S \\ u0000h \\ u0000a \\ u0000r \\ u0000o \\ u0000n \\ u0000 \\ u0000B \\ u0000e \\ u0000r \\ u0000g \\ u0000e \\ u0000r \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00002 \\ u00000 \\ u00001 \\ u0000”,“ _2”:“ \\ u0000K \\ u0000a \\ u0000r \\ u0000e \\ u0000e \\ u0000n \\ u0000a \\ u0000 \\ u0000D \\ u0000a \\ u0000v \\ u0000i \\ u0000e \\ u0000s \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00002 \\ u00000 \\ u00002 \\ u0000”,“ _2”:“ \\ u0000L \\ u0000i \\ u0000n \\ u0000a \\ u0000 \\ u0000H \\ u0000a \\ u0000n \\ u0000n \\ u0000a \\ u0000w \\ u0000e \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00002 \\ u00000 \\ u00003 \\ u0000”,“ _2”:“ \\ u0000S \\ u0000a \\ u0000m \\ u0000 \\ u0000V \\ u0000i \\ u0000t \\ u0000a \\ u0000n \\ u0000z \\ u0000a \\ u0000a \\ u0000a \\ u0000n }

{“ _1”:“ \\ u00002 \\ u00000 \\ u00006 \\ u0000”,“ _2”:“ \\ u0000K \\ u0000a \\ u0000r \\ u0000e \\ u0000e \\ u0000n \\ u0000a \\ u0000 \\ u0000D \\ u0000a \\ u0000v \\ u0000i \\ u0000e \\ u0000s \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00002 \\ u00000 \\ u00008 \\ u0000”,“ _2”:“ \\ u0000K \\ u0000a \\ u0000r \\ u0000e \\ u0000e \\ u0000n \\ u0000a \\ u0000 \\ u0000D \\ u0000a \\ u0000v \\ u0000i \\ u0000e \\ u0000s \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00002 \\ u00001 \\ u00004 \\ u0000”,“ _2”:“ \\ u0000K \\ u0000a \\ u0000r \\ u0000e \\ u0000e \\ u0000n \\ u0000a \\ u0000 \\ u0000D \\ u0000a \\ u0000v \\ u0000i \\ u0000e \\ u0000s \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00002 \\ u00001 \\ u00007 \\ u0000”,“ _2”:“ \\ u0000K \\ u0000y \\ u0000l \\ u0000i \\ u0000e \\ u0000 \\ u0000B \\ u0000r \\ u0000a \\ u0000d \\ u0000l \\ u0000e \\ u0000y \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00002 \\ u00001 \\ u00008 \\ u0000”,“ _2”:“ \\ u0000K \\ u0000a \\ u0000t \\ u0000e \\ u0000 \\ u0000F \\ u0000a \\ u0000r \\ u0000d \\ u0000e \\ u0000l \\ u0000l \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00002 \\ u00001 \\ u00009 \\ u0000”,“ _2”:“ \\ u0000C \\ u0000a \\ u0000m \\ u0000e \\ u0000r \\ u0000o \\ u0000n \\ u0000 \\ u0000P \\ u0000a \\ u0000l \\ u0000m \\ u0000e \\ u0000r \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00002 \\ u00002 \\ u00003 \\ u0000”,“ _2”:“ \\ u0000S \\ u0000a \\ u0000m \\ u0000 \\ u0000V \\ u0000i \\ u0000t \\ u0000a \\ u0000n \\ u0000z \\ u0000a \\ u0000a \\ u0000a \\ u0000n }

{“ _1”:“ \\ u00002 \\ u00002 \\ u00005 \\ u0000”,“ _2”:“ \\ u0000K \\ u0000a \\ u0000r \\ u0000e \\ u0000e \\ u0000n \\ u0000a \\ u0000 \\ u0000D \\ u0000a \\ u0000v \\ u0000i \\ u0000e \\ u0000s \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00002 \\ u00002 \\ u00006 \\ u0000”,“ _2”:“ \\ u0000K \\ u0000a \\ u0000t \\ u0000e \\ u0000 \\ u0000F \\ u0000a \\ u0000r \\ u0000d \\ u0000e \\ u0000l \\ u0000l \\ u0000l u0000“}

{“ _1”:“ \\ u00002 \\ u00002 \\ u00008 \\ u0000”,“ _2”:“ \\ u0000K \\ u0000a \\ u0000r \\ u0000e \\ u0000e \\ u0000n \\ u0000a \\ u0000 \\ u0000D \\ u0000a \\ u0000v \\ u0000i \\ u0000e \\ u0000s \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00002 \\ u00002 \\ u00009 \\ u0000”,“ _2”:“ \\ u0000K \\ u0000a \\ u0000r \\ u0000e \\ u0000e \\ u0000n \\ u0000a \\ u0000 \\ u0000D \\ u0000a \\ u0000v \\ u0000i \\ u0000e \\ u0000s \\ u0000 \\ r \\ u0000“}

{“ _1”:“ \\ u00002 \\ u00003 \\ u00000 \\ u0000”,“ _2”:“ \\ u0000K \\ u0000a \\ u0000r \\ u0000e \\ u0000e \\ u0000n \\ u0000a \\ u0000 \\ u0000D \\ u0000a \\ u0000v \\ u0000i \\ u0000e \\ u0000s \\ u0000 \\ r \\ u0000“}

这是实际的CSV数据

Retail Store,Store Retail Business Manager
105,Kate Fardell
106,Shona Marino
108,Shona Marino
111,Shona Marino
112,Lina Hannawe
113,Jennifer Hale
114,Stan Kakkasis
116,Stan Kakkasis
118,Stan Kakkasis
119,Shona Marino
127,Aydin Tebyanian
128,Cameron Palmer
197,Sharon Berger
201,Kareena Davies
202,Lina Hannawe
203,Sam Vitanza
206,Kareena Davies
208,Kareena Davies
214,Kareena Davies
217,Kylie Bradley
218,Kate Fardell
219,Cameron Palmer
223,Sam Vitanza
225,Kareena Davies
226,Kate Fardell
228,Kareena Davies
229,Kareena Davies
230,Kareena Davies

我以为可能是编码?

进行以下一行:

using (var reader = new StreamReader(records.Payload))

像这样:

using (var reader = new StreamReader(records.Payload, System.Text.Encoding.UTF8))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM