简体   繁体   中英

Apache Nifi JOLTTransformJSON Processor

I have a flow that is designed to pull tweets using Get twitter processor and I use JOLTTransformJSON processor to extract few attributes including the hashtags, my Jolt Specification is as follows

[
  {
"operation": "shift",
"spec": {
  "entities": {
    "hashtags": {
      "*": "hashtags"
    }
  },
  "text": "content",
  "id": "id",
  "timestamp_ms": "timestamp",
  "retweet_count": "retweetcount",
  "url": "url"
}
  },
  {
"operation": "default",
"spec": {
  "type": "twitter"
}
  },
  {
"operation": "cardinality",
"spec": {
  "hashtags": "MANY"
}
  }
 ]

when the twitter output contains hashtags the JOLTTransformJSON processor output will give me those hashtags in the following way

{
"hashtags": [{
    "text": "Venus",
    "indices": [16,
    22]
},
{
    "text": "Cancer",
    "indices": [69,
    76]
},
{
    "text": "ascendant",
    "indices": [86,
    96]
}],
"content": "acmc_clock_euro #Venus is now (16h58m01s UT) setting at 10°32'50.2'' #Cancer opposite #ascendant at Helsinki, SF",
"id": 895332436975931393,
"timestamp": "1502298862104",
"retweetcount": 0,
"url": "https://twitter.com/pe602/status/895332436975931393",
"type": "twitter"
}

but if the hashtags array is empty as follows

"entities": {
    "hashtags": []

the output will not contain hashtags, how can I make the output have hashtags element but with an empty array if the twitter op doesn't have hashtags using JOLT processor.

Try this in your 2nd "default" operation.

{
  "operation": "default",
  "spec": {
    "hastags": [],
    "type": "twitter"
  }
}

Resolved this by using "modify-default-beta" operation as follows

{ "operation": "modify-default-beta", "spec": { "hashtags": [] } }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM