简体   繁体   English

使用 Regex 从大型文本输入文件中提取 json object

[英]Use Regex to extract a json object from a large text-input file

I have a large textfile with nonsense and a json object somewhere in it.我有一个带有废话的大文本文件和一个 json object 在其中某处。 I knew that the json object has a textfile-far unique keyword so I'll look for this unique keyword.我知道 json object 有一个 textfile-far 唯一关键字,所以我会寻找这个唯一关键字。 I knew this word is every time in the object and every time under the "root" location.我知道这个词每次都在 object 并且每次都在“根”位置下。 Here is an Example json-string这是一个示例 json-string

....
{
  "key0":"value0",
  "key1":"value0",
  "key2":"value0",
  "uniqueKey":"value0",
  "key0":[
   {"key0":"value0","key1":"value1"}

   ]
}
....

so I had wrote this method to extract the json object: It works find but I thought - regex?所以我写了这个方法来提取 json object:它可以找到但我想 - 正则表达式?

private JsonObject parse(String text, String keywordInJsonFile) {

        int index = text.indexOf(keywordInJsonFile);
        int lastIndex = text.lastIndexOf(keywordInJsonFile);

        if (index != lastIndex) {
            log.warn("The keyword isn't unique please check your input file '{}'", keywordInJsonFile);
            log.warn("Continue with the first match at index {}", index);
        }

        int indexJsonStart;
        int indexJsonStop;
        int currentIndex = index;
        int bracketCounter = 0;
        
        // loop and find the first '{' from the json Object
        while (true) {
            currentIndex--;
            char c = text.charAt(currentIndex);
            if (c == '}') bracketCounter++;
            if (c == '{') bracketCounter--;
            if (c == '{' && bracketCounter == -1)
            {
                indexJsonStart = currentIndex;
                break;
            }
        }

        currentIndex = index;
        bracketCounter = 0;

        // loop and find the last '}' from the json Object
        while (true) {
            currentIndex++;
            char c = text.charAt(currentIndex);
            if (c == '}') bracketCounter++;
            if (c == '{') bracketCounter--;
            if (c == '}' && bracketCounter == 1)
            {
                indexJsonStop = currentIndex +1;
                break;
            }
        }
        // Gson -> JsonObject has to be between the { } 
        return new JsonParser().parse(text.substring(indexJsonStart, indexJsonStop)).getAsJsonObject();
    }

I asked me the question: is it possible to regex it?我问了我一个问题:可以正则表达式吗? A Saturday evening later and I don't think so.一个星期六晚上之后,我不这么认为。 I can't figure out how I can formulate the "give me the first open bracket that hasn't ben closed jet" or "give me the first close bracket that hasn't ben opened jet".我不知道如何制定“给我第一个尚未关闭的喷气式飞机的开放式支架”或“给我第一个尚未打开的喷气式飞机的封闭式支架”。 can someone help me out?有人可以帮我吗?

Alternative - regex:替代 - 正则表达式:

"^\\{\n^\\s+\"[^\"]+\":\"[^\"]+\",\n.*?^\\}\n"

See regex in context:在上下文中查看正则表达式:

public static void main(String[] args) {
    String input = "dfga gsdgdf fdgdfsgfd asdfgf\n"
            + "AAAA SSSSSS ddddddddd ffffffff ggggggg\n"
            + "{\n"
            + "  \"key0\":\"value0\",\n"
            + "  \"key1\":\"value0\",\n"
            + "  \"key2\":\"value0\",\n"
            + "  \"uniqueKey\":\"value0\",\n"
            + "  \"key0\":[\n"
            + "   {\"key0\":\"value0\",\"key1\":\"value1\"}\n"
            + "\n"
            + "   ]\n"
            + "}\n"
            + "dfga gsdgdf fdgdfsgfd asdfgf\n"
            + "BBBB cccccccc ZZZZZZZ xxxxxxxxxxx cccccccccccc\n";

    Matcher matcher = Pattern
            .compile("^\\{\n^\\s+\"[^\"]+\":\"[^\"]+\",\n.*?^\\}\n"
                    , Pattern.MULTILINE|Pattern.DOTALL).matcher(input);

    while(matcher.find()) {
        String result = matcher.group();

        //Output
        System.out.println(result);
    }
}

Output: Output:

{
    "key0":"value0",
    "key1":"value0",
    "key2":"value0",
    "uniqueKey":"value0",
    "key0":[
    {"key0":"value0","key1":"value1"}
    
    ]
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM