简体   繁体   English

从非结构化字符串中提取 JSON 字符串

[英]Extract JSON string from unstructured string

I have an unstructured String and I would like to extract the following JSON string with the "restaurant" tag from there using the regex.我有一个非结构化字符串,我想使用正则表达式从那里提取带有“restaurant”标签的以下 JSON 字符串。 The data is for the example but the format and the "restaurant" tag is correct.数据是示例,但格式和“餐厅”标签是正确的。

{
    "restaurant": {
        "id": "abcd-efgh-ijkl",
        "created_at": "2020-12-31",
        "cashier_payments": []
    }
 }

I come up with the regex String findMe = "\"restaurant\": {(\\n.*?)+}";我想出了正则表达式String findMe = "\"restaurant\": {(\\n.*?)+}"; , however, its taking all the data till the last } . ,但是,它将所有数据提取到最后一个}

How do I correct the regex?如何更正正则表达式?

As asked, I get the unstructured String using the Jsoup:根据要求,我使用 Jsoup 获得了非结构化字符串:

        String htmlString = contentBuilder.toString();
        Document doc = Jsoup.parse(htmlString);
        Elements elements = doc.getElementsByTag("script");
    
        for (Element element :elements ){
            
            for (DataNode node : element.dataNodes()) {
                String s = node.getWholeData();
                if(s.contains("\"restaurant\":")){
                    System.out.println(s);
                }
            }
            System.out.println("-------------------");
        }

So I would like to parse from the String s.所以我想从 String 解析。

If the entries you're intending to extract do not contain objects (otherwise, you'll need a proper JSON parser), you can use the following regex: "restaurant":\s*\{[^}]*\}如果您要提取的条目不包含对象(否则,您将需要适当的 JSON 解析器),您可以使用以下正则表达式: "restaurant":\s*\{[^}]*\}
Edit: It seems like the value object does indeed contain other objects, so I'll suggest using a JSON library, like Jackson.编辑:似乎 object 的值确实包含其他对象,所以我建议使用 JSON 库,例如 Jackson。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM