简体   繁体   中英

Nested JSON - Azure Data Lake - U-SQL Extraction to CSV

I've tried different methods to extract data from my JSON files and convert to CSV in U-SQL but they all seem to generate empty files or just output the header row.

I was previously trying JSON tuples but since that generated empty files I am now trying to use the MultiLevelJsonExtractor.

My JSON files are structured like this:

{
   "responseHeader":{
      "status":0,
      "QTime":25,
      "params":{
         "q":"query",
         "rows":"7000",
         "wt":"json"
      }
   },
   "response":{
      "docs":[
         {
            "uri":"www.google.com",
            "date_dt":"2017-05-30T23:00:00Z",
            "title":"Google"
         },
         {
            "uri":"www.yahoo.com",
            "date_dt":"2017-03-30T23:00:00Z",
            "title":"Yahoo"
         }
      ]
   }
}

My current code:

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

USING Microsoft.Analytics.Samples.Formats.Json;

DECLARE @INPUT_FILE string = @"{*}.json";
DECLARE @OUTPUT_FILE string = @"/Output/Output.csv";

@json =
EXTRACT
    uri string,
    date_dt string,
    title string,
FROM
    @INPUT_FILE
USING new MultiLevelJsonExtractor("docs[*]", true, "uri", "date_dt", "title");

OUTPUT @json
TO @OUTPUT_FILE
USING Outputters.Csv(outputHeader:true,quoting:true);

This just outputs the header row. Is there an issue with the way I am referencing multiple files? eg {*}.json?

I'm just trying to pull out the three fields within the docs nodeset from each JSON file and output to CSV.

How exactly do you want the .csv to look on output? I just changed the json path to response.docs[*] and got two rows for your sample json, ie

@json =
    EXTRACT uri string,
            date_dt string,
            title string
    FROM @INPUT_FILE
    USING new MultiLevelJsonExtractor("response.docs[*]", true, "uri", "date_dt", "title");

My results:

我的结果

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM