简体   繁体   English

如何将JSON消息转换为具有可空字段的avro架构的有效JSON?

[英]How to transform JSON message to valid JSON for avro schema with nullable fields?

I would like to send a JSON message to a kafka topic with avro-schema. 我想用avro-schema向kafka主题发送JSON消息。

The avro-schema allows multiple types: avro-schema允许多种类型:

{  
   "name":"typeA",
   "type":[  
      "int",
      "null"
   ],
   "default":null
}

If the value is null, everything is fine. 如果值为null,一切都很好。 If the type is an int as in this case then this must be specified explicitly. 如果类型是int就像在这种情况下那么必须明确指定。 See this ticket AVRO-1582 . 查看此票AVRO-1582

I have this JSON: 我有这个JSON:

{
   "typeA":12345,
   "typeB":[
      {
         "feature":1,
         "value":"1"
      },
      {
         "feature":2,
         "value":"2"
      }
   ],
   "typeC":[
      {
         "a":12345,
         "data":[
            {
               "a":12345,
               "b":[
                  12345,
                  12345,
                  12345
               ]
            }
         ]
      }
   ]
}

I would like to transform into this JSON: 我想转换成这个JSON:

{
   "typeA":{
      "int":12345
   },
   "typeB":{
      "array":[
         {
            "feature":1,
            "value":"1"
         },
         {
            "feature":2,
            "value":"2"
         }
      ]
   },
   "typeC":{
      "array":[
         {
            "a":12345,
            "data":[
               {
                  "a":12345,
                  "b":[
                     12345,
                     12345,
                     12345
                  ]
               }
            ]
         }
      ]
   }
}

Is is possible to transform "typeA":12345 to "typeA":{"int":12345} ? 有可能将"typeA":12345"typeA":{"int":12345} Exists an easy way to handle this issue? 有一个简单的方法来处理这个问题?

I know the type of every field so I could use a regex in JAVA: 我知道每个字段的类型,所以我可以在JAVA中使用正则表达式:

json.replaceAll("typeA\":([^,]*),\"", "typeA\":{\"int\":$1},\"");

It's hard to handle arrays or the last JSON element. 处理数组或最后一个JSON元素很难。 How can I solve this problem? 我怎么解决这个问题?

I can transform typeA to: 我可以将typeA转换为:

"typeA":{
    "int":12345
},

But typeB and typeC were too difficult for me, because I couldn't match it precisely. 但是typeBtypeC对我来说太难了,因为我无法准确地匹配它。 Somehow when I try to replace typeB with a array. 不知何故,当我尝试用数组替换typeB时。 Another place get replaced too, which we don't want. 另一个地方也被替换了,我们不想要。

If you or someone else could solve that problem, then typeC can also be easily fixed. 如果您或其他人可以解决该问题,那么也可以轻松修复typeC Because typeB and typeC are similar. 因为typeBtypeC是相似的。 I'm also curious what the solution is. 我也很好奇解决方案是什么。 So let me know! 所以,让我知道!

I will now share how I fixed typeA . 我现在将分享我如何修复typeA Here is the Java code: 这是Java代码:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {

    public static void main(String[] args) {
        String line = "{\r\n" +
                "   \"typeA\":12345,\r\n" +
                "   \"typeB\":[\r\n" +
                "      {\r\n" +
                "         \"feature\":1,\r\n" +
                "         \"value\":\"1\"\r\n" +
                "      },\r\n" +
                "      {\r\n" +
                "         \"feature\":2,\r\n" +
                "         \"value\":\"2\"\r\n" +
                "      }\r\n" +
                "   ],\r\n" +
                "   \"typeC\":[\r\n" +
                "      {\r\n" +
                "         \"a\":12345,\r\n" +
                "         \"data\":[\r\n" +
                "            {\r\n" +
                "               \"a\":12345,\r\n" +
                "               \"b\":[\r\n" +
                "                  12345,\r\n" +
                "                  12345,\r\n" +
                "                  12345\r\n" +
                "               ]\r\n" +
                "            }\r\n" +
                "         ]\r\n" +
                "      }\r\n" +
                "   ]\r\n" +
                "}";

        String regex = "(\\\"type[A-Z]\\\"):(\\d*)|(\\[.*)|(.*\\])";
        Pattern pattern = Pattern.compile(regex);
        Matcher matcher = pattern.matcher(line);
        while (matcher.find()) {
             if(matcher.group().equals("\"typeA\":12345")) {
                 String regex2 = "(\\\"typeA\\\"):(\\d*)";
                 line = line.replaceAll(regex2, "$1:{\"int\":$2}");
             }

             if(matcher.group().equals("\"typeB\":") ) {
                 //I couldn't finish this typeB, because it's too difficult
//                 String regex3 = "(\\\"type[B]\\\"):|(\\s{3}],)";
//                 line = line.replaceAll(regex3, "$1 :{ array: $2 ");
             }
        }
         System.out.println("line: " + line);
    }
}

First I used this regex (\\"type[AZ]\\"):(\\d*)|(\\[.*)|(.*\\]) . 首先我使用了这个正则表达式(\\"type[AZ]\\"):(\\d*)|(\\[.*)|(.*\\]) That regex gives us several groups where we want to look at. 正则表达式为我们提供了几个我们想要看的组。

Eventually the while loop encouters "typeA":12345 And that's where we use the regex ("typeA"):(\\d*) . 最后while循环鼓励"typeA":12345 typeA "typeA":12345这就是我们使用正则表达式("typeA"):(\\d*) We use that regex to transfrom typeA to: 我们使用该正则表达式将typeA为:

"typeA":{
    "int":12345
},

This wasn't super fun, due to the names of all the values, but Jackson worked just fine. 由于所有价值观的名称,这并不是非常有趣,但杰克逊工作得很好。

I pasted json that you wanted into json2pojo : 我把你想要的json粘贴到json2pojo中

{
  "typeA":{
     "int":12345
  },
  "typeB":{
     "array":[
        {
           "feature":1,
           "value":"1"
        },
        {
           "feature":2,
           "value":"2"
        }
     ]
  },
  "typeC":{
     "array":[
        {
           "a":12345,
           "data":[
              {
                 "a":12345,
                 "b":[
                    12345,
                    12345,
                    12345
                 ]
              }
           ]
        }
     ]
  }
}

Used the download as zip feature to get the classes to my local development environment. 使用下载作为zip功能将类提供给我的本地开发环境。

Then created this monstrosity of a class that tests out Jackson with the generated classes from json2pojo. 然后创建了一个类的怪物,用json2pojo生成的类测试杰克逊。

public class JacksonSerialization {

   public static void main(String... args) throws Exception {

       TypeA typeA = new TypeA();
       typeA.setInt(12345);

       TypeB typeB = new TypeB();
       ArrayList<Array> arrays = new ArrayList<>();
       arrays.add(createArray(1, "1"));
       arrays.add(createArray(2, "2"));
       typeB.setArray(arrays);

       TypeC typeC = new TypeC();

       ArrayList<Integer> integers = new ArrayList<>();
       integers.add(12345);
       integers.add(12345);
       integers.add(12345);

       ArrayList<Datum> data = new ArrayList<>();
       Datum datum = new Datum();
       datum.setA(12345);
       datum.setB(integers);
       data.add(datum);

       Array_ array_ = new Array_();
       array_.setA(12345);
       array_.setData(data);

       ArrayList<Array_> array_s = new ArrayList<>();
       array_s.add(array_);
       typeC.setArray(array_s);

       Example example = new Example();
       example.setTypeA(typeA);
       example.setTypeB(typeB);
       example.setTypeC(typeC);

       ObjectMapper mapper = new ObjectMapper();
       mapper.configure(MapperFeature.SORT_PROPERTIES_ALPHABETICALLY, false);
       mapper.configure(SerializationFeature.INDENT_OUTPUT, true);

       ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
       //mapper.writeValue(byteArrayOutputStream, example);
       mapper.writeValue(new File("target/Example.json"), example);

       String json = byteArrayOutputStream.toString();

       json = StringEscapeUtils.escapeCsv(json);

       System.out.println(json);
   }

   private static Array createArray(Integer feature, String value) {

       Array array = new Array();
       array.setFeature(feature);
       array.setValue(value);

       return array;
   }
}

Which when this class runs, produces the following json. 当这个类运行时,产生以下json。

{
  "typeA" : {
    "int" : 123456
  },
  "typeB" : {
    "array" : [ {
      "feature" : 1,
      "value" : "1"
    }, {
      "feature" : 2,
      "value" : "2"
    } ]
  },
  "typeC" : {
    "array" : [ {
      "a" : 12345,
      "data" : [ {
        "a" : 12345,
        "b" : [ 12345, 12345, 12345 ]
      } ]
    } ]
  }
}

Which, I think is fairly close to what you asked for. 我认为这与您要求的相当接近。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM