简体   繁体   中英

How to Condense & Nest a (CSV) Payload in Dataweave 2.0?

I have a CSV payload TV Programs & Episodes that I want to Transform (Nest & Condense) to a JSON, with the following conditions:

  • Merge consecutive Program Lines (that are not followed by an Episode Line), so that it becomes 1 Program with the Start Date of the 1st Instance and the Summation of the Duration.
  • Episode Lines after a Program Line are Nested under the Program

INPUT

 Channel|Name|Start|Duration|Type ACME|Broke Girls|2018-02-01T00:00:00|600|Program ACME|Broke Girls|2018-02-01T00:10:00|3000|Program ACME|S03_8|2018-02-01T00:13:05|120|Episode ACME|S03_9|2018-02-01T00:29:10|120|Episode ACME|S04_1|2018-02-01T00:44:12|120|Episode ACME|Lost In Translation|2018-02-01T02:01:00|1800|Program ACME|Lost In Translation|2018-02-01T02:30:00|1800|Program ACME|The Demolition Man|2018-02-01T03:00:00|1800|Program ACME|The Demolition Man|2018-02-01T03:30:00|1800|Program ACME|The Demolition Man|2018-02-01T04:00:00|1800|Program ACME|The Demolition Man|2018-02-01T04:30:00|1800|Program ACME|Photon|2018-02-01T05:00:00|1800|Program ACME|Photon|2018-02-01T05:30:00|1800|Program ACME|Miles & Smiles|2018-02-01T06:00:00|3600|Program ACME|S015_1|2018-02-01T06:13:53|120|Episode ACME|S015_2|2018-02-01T06:29:22|120|Episode ACME|S015_3|2018-02-01T06:46:28|120|Episode ACME|Ice Age|2018-02-01T07:00:00|300|Program ACME|Ice Age|2018-02-01T07:05:00|600|Program ACME|Ice Age|2018-02-01T07:15:00|2700|Program ACME|S01_4|2018-02-01T07:17:17|120|Episode ACME|S01_5|2018-02-01T07:32:11|120|Episode ACME|S01_6|2018-02-01T07:47:20|120|Episode ACME|My Girl Friday|2018-02-01T08:00:00|3600|Program ACME|S05_7|2018-02-01T08:17:28|120|Episode ACME|S05_8|2018-02-01T08:31:59|120|Episode ACME|S05_9|2018-02-01T08:44:42|120|Episode ACME|Pirate Bay|2018-02-01T09:00:00|3600|Program ACME|S01_1|2018-02-01T09:33:12|120|Episode ACME|S01_2|2018-02-01T09:46:19|120|Episode ACME|Broke Girls|2018-02-01T10:00:00|1200|Program ACME|S05_3|2018-02-01T10:13:05|120|Episode ACME|S05_4|2018-02-01T10:29:10|120|Episode

OUTPUT

 { "programs": [ { "StartTime": "2018-02-01T00:00:00", "Duration": 3600, "Name": "Broke Girls", "episode": [ { "name": "S03_8", "startDateTime": "2018-02-01T00:13:05", "duration": 120 }, { "name": "S03_9", "startDateTime": "2018-02-01T00:29:10", "duration": 120 }, { "name": "S04_1", "startDateTime": "2018-02-01T00:44:12", "duration": 120 } ] }, { "StartTime": "2018-02-01T06:00:00", "Duration": 3600, "Name": "Miles & Smiles", "episode": [ { "name": "S015_1", "startDateTime": "2018-02-01T06:13:53", "duration": 120 }, { "name": "S015_2", "startDateTime": "2018-02-01T06:29:22", "duration": 120 }, { "name": "S015_3", "startDateTime": "2018-02-01T06:46:28", "duration": 120 } ] }, { "StartTime": "2018-02-01T07:00:00", "Duration": 3600, "Name": "Ice Age", "episode": [ { "name": "S01_4", "startDateTime": "2018-02-01T07:17:17", "duration": 120 }, { "name": "S01_5", "startDateTime": "2018-02-01T07:32:11", "duration": 120 }, { "name": "S01_6", "startDateTime": "2018-02-01T07:47:20", "duration": 120 } ] }, { "StartTime": "2018-02-01T08:00:00", "Duration": 3600, "Name": "My Girl Friday", "episode": [ { "name": "S05_7", "startDateTime": "2018-02-01T08:17:28", "duration": 120 }, { "name": "S05_8", "startDateTime": "2018-02-01T08:31:59", "duration": 120 }, { "name": "S05_9", "startDateTime": "2018-02-01T08:44:42", "duration": 120 } ] }, { "StartTime": "2018-02-01T09:00:00", "Duration": 3600, "Name": "Pirate Bay", "episode": [ { "name": "S01_1", "startDateTime": "2018-02-01T09:33:12", "duration": 120 }, { "name": "S01_2", "startDateTime": "2018-02-01T09:46:19", "duration": 120 } ] }, { "StartTime": "2018-02-01T10:00:00", "Duration": 1200, "Name": "Broke Girls", "episode": [ { "name": "S05_3", "startDateTime": "2018-02-01T10:13:05", "duration": 120 }, { "name": "S05_4", "startDateTime": "2018-02-01T10:29:10", "duration": 120 } ] } ] }

Give this a try, comments are embedded:

%dw 2.0
output application/dw
var data = readUrl("classpath://data.csv","application/csv",{separator:"|"})
var firstProgram = data[0].Name
---
// Identify the programs by adding a field
(data reduce (e,acc={l: firstProgram, c:0, d: []}) -> do {
    var next = acc.l != e.Name and e.Type == "Program" 
    var counter = if (next) acc.c+1 else acc.c
    ---
    {
        l: if (next) e.Name else acc.l,
        c: counter,
        d: acc.d + {(e), pc: counter}
    }
}).d 
// group by the identifier of individual programs
groupBy $.pc
// Get just the programs throw away the program identifiers 
pluck $
// Throw away the programs with no episodes
filter ($.*Type contains "Episode")
// Iterate over the programs
map do {
    // sum the program duration
    var d = $ dw::core::Arrays::sumBy (e) -> if (e.Type == "Program") e.Duration else 0
    // Get the episodes and do a little cleanup
    var es = $ map $-"pc" filter ($.Type == "Episode")
    ---
    // Form the desired structure
    {
        ($[0] - "pc" - "Duration"),
        Duration: d,
        Episode: es 
    }
}

NOTE1 : I stored the contents in a file and read it using readUrl , you need to adjust to accommodate from where you get your data from.

NOTE2 : Maybe you need to rethink your inputs, organize them better, if possible.

NOTE3 : Studio will show errors (at least Studio 7.5.1 does). They are false positives, the code runs

NOTE4 : Lots of steps because of the non-trivial input. Potentialy the code could be optimized but I did spend enough time on it--I 'll let you deal with the optimization or somebody else from the community can help.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM