简体   繁体   English

正则表达式以匹配HTML内JSON片段的字母数字ID列表

[英]Regex to match a list of alphanumeric ids of a JSON fragment inside HTML

I'm trying to compose a regular expression to match the following situation: 我正在尝试编写一个正则表达式以匹配以下情况:

In a Node.js project I have a multiline string that contains a big HTML code mixed with some JS with this structure: 在Node.js项目中,我有一个多行字符串,其中包含一个大型HTML代码,并混合了一些具有这种结构的JS:

<html>
  <head>
  </head>
  <body>
    <script type="text/javascript">
      ... more code ...
      },
      "bookIds" : [
        "abc123",
        "qwe456",
        "asd789"
      ],
      ... more code, and in another json:
      },
      "bookIds" : [
        "foo111",
        "bar222",
        "baz333"
      ],
      ... more code ...
    </script>
  </body>
</html>

My goal is to get the first list of bookIds: 我的目标是获取bookIds的第一个列表:

abc123
qwe456
asd789

So, as you can see, the conditions that I'm working with, for now, are: 因此,如您所见,目前我正在使用的条件是:

  • Search the first "bookIds" : [ appearance and stop at the next ] 搜索第一个"bookIds" : [外观并停在下一个]

I got something like that with: /bookIds" : \\[([\\S\\s]*?)\\]/ . Yeah, conceptually I though about finding the first string bookIds , start after the first [ after that, and stop before the next ] , but I don't know how to do it. I'm now getting documented about lookahead & lookbehinds. 是的,我得到了这样的东西: /bookIds" : \\[([\\S\\s]*?)\\]/ 。是的,从概念上讲,我虽然要查找第一个字符串bookIds ,但要在第一个[之后,然后在之前停止下一个] ,但是我不知道该怎么做,现在我正在记录有关超前和后退的记录。

  • Now I need to search (or loop) inside that match and get what's inside quotes (I know how could I do that individually: /"(.*?)"/ ) 现在,我需要在该匹配项中进行搜索(或循环),并获取其内的引号(我知道如何单独执行此操作:/"(. /"(.*?)"/ )"/)

But unfortunately I've been hours googling and trying and I'm not getting it to work (neither in my Node project nor the tests I'm trying in regex101.com ) 但是不幸的是,我已经花了几个小时进行谷歌搜索和尝试,但是我没有使它正常工作(既不在Node项目中,也不在regex101.com中尝试的测试中)

Any suggestions will be much appreciated! 任何建议将不胜感激!

You can use "bookIds"\\s*:\\s*\\[([^\\]]+?)] Demo 您可以使用"bookIds"\\s*:\\s*\\[([^\\]]+?)] 演示

 let str = `<html> <head> </head> <body> <script type="text/javascript"> "bookIds" : [ "abc123", "qwe456", "asd789" ], "bookIds" : [ "foo111", "bar222", "baz333" ], <\\/script> <\\/body> <\\/html>` let op = str.match(/"bookIds"\\s*:\\s*\\[([^\\]]+?)]/m) console.log(op[1].replace(/[\\s"]+/g,'')) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM