[英]U-sql : How to process an Avro file with multiple JSON arrays with multiple objects?
我通過流分析和使用捕獲的事件中心在Data Lake Store中收到一個Avro文件。
該文件的結構如下所示:
[{ “ID”:1, “PID”: “ABC”, “值”: “1”, “utctimestamp”:1537805867},{ “ID”:6569, “PID”: “1E014000”, “值”: “ -5.8”,“ utctimestamp”:1537805867}] [{“ id”:2,“ pid”:“ cde”,“ value”:“ 77”,“ utctimestamp”:1537772095},{“ id”:6658, “PID”: “02002001”, “值”: “77”, “utctimestamp”:1537772095}]
我使用了以下腳本:
@rs =
EXTRACT
SequenceNumber long,
Offset string,
EnqueuedTimeUtc string,
Body byte[]
FROM @input_file
USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(@"
{
""type"": ""record"",
""name"": ""EventData"",
""namespace"": ""Microsoft.ServiceBus.Messaging"",
""fields"": [
{
""name"": ""SequenceNumber"",
""type"": ""long""
},
{
""name"": ""Offset"",
""type"": ""string""
},
{
""name"": ""EnqueuedTimeUtc"",
""type"": ""string""
},
{
""name"": ""SystemProperties"",
""type"": {
""type"": ""map"",
""values"": [
""long"",
""double"",
""string"",
""bytes""
]
}
},
{
""name"": ""Properties"",
""type"": {
""type"": ""map"",
""values"": [
""long"",
""double"",
""string"",
""bytes"",
""null""
]
}
},
{
""name"": ""Body"",
""type"": [
""null"",
""bytes""
]
}
]
}
");
@jsonify = SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(Encoding.UTF8.GetString(Body)) AS message FROM @rs;
@cnt = SELECT message["id"] AS id,
message["id2"] AS pid,
message["value"] AS value,
message["utctimestamp"] AS utctimestamp,
message["extra"] AS extra
FROM @jsonify;
OUTPUT @cnt TO @output_file USING Outputters.Text(quoting: false);
該腳本生成一個文件,但其中只帶有定界逗號且沒有值。
如何提取/轉換此結構,以便將其輸出為展平的4列csv文件?
我通過再次展開JSON列並再次應用JsonTuple
函數(但是我懷疑可以簡化)來JsonTuple
起作用:
@jsonify =
SELECT JsonFunctions.JsonTuple(Encoding.UTF8.GetString(Body)) AS message
FROM @rs;
// Explode the tuple as key-value pair;
@working =
SELECT key,
JsonFunctions.JsonTuple(value) AS value
FROM @jsonify
CROSS APPLY
EXPLODE(message) AS y(key, value);
完整腳本:
REFERENCE ASSEMBLY Avro;
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
DECLARE @input_file string = @"\input\input21.avro";
DECLARE @output_file string = @"\output\output.csv";
@rs =
EXTRACT
Body byte[]
FROM @input_file
USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(@"{
""type"": ""record"",
""name"": ""EventData"",
""namespace"": ""Microsoft.ServiceBus.Messaging"",
""fields"": [
{
""name"": ""SequenceNumber"",
""type"": ""long""
},
{
""name"": ""Offset"",
""type"": ""string""
},
{
""name"": ""EnqueuedTimeUtc"",
""type"": ""string""
},
{
""name"": ""SystemProperties"",
""type"": {
""type"": ""map"",
""values"": [
""long"",
""double"",
""string"",
""bytes""
]
}
},
{
""name"": ""Properties"",
""type"": {
""type"": ""map"",
""values"": [
""long"",
""double"",
""string"",
""bytes"",
""null""
]
}
},
{
""name"": ""Body"",
""type"": [
""null"",
""bytes""
]
}
]
}");
@jsonify =
SELECT JsonFunctions.JsonTuple(Encoding.UTF8.GetString(Body)) AS message
FROM @rs;
// Explode the tuple as key-value pair;
@working =
SELECT key,
JsonFunctions.JsonTuple(value) AS value
FROM @jsonify
CROSS APPLY
EXPLODE(message) AS y(key, value);
@cnt =
SELECT value["id"] AS id,
value["id2"] AS pid,
value["value"] AS value,
value["utctimestamp"] AS utctimestamp,
value["extra"] AS extra
FROM @working;
OUTPUT @cnt TO @output_file USING Outputters.Text(quoting: false);
我的結果:
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.