简体   繁体   English

正则表达式仅匹配第一个内容

[英]regex only match the first content

I have the following html code. 我有以下html代码。

<script type="application/ld+json">
    {"foo" : "bar"}
</script>

<script type="application/ld+json">
    {"foo" : "bar"}
</script>

<script type="application/ld+json">
    {"foo" : "bar"}
</script>

I am trying to grab the json content from inside the first script. 我正在尝试从第一个脚本中获取json内容。 But if I do 但是如果我这样做

/<script type="application\/ld\+json">{.*}<\/script>/

it gives everything before the first script opening tag and the last script closing tag. 它给出了第一个脚本开始标记和最后一个脚本结束标记之前的所有内容。 if I do 如果我做

/<script type="application\/ld\+json">{.*?}<\/script>/

for some reason, I only get the second part. 由于某种原因,我只得到第二部分。

Is there anyway to get the {} json part from the first tag? 无论如何,要从第一个标签获取{} json部分?

that should not even compile.. but anyway looks like you confused greedy and nongreedy and the type of brackets, {X} means that the group before can be there x amount of time, and not whatever is in x any amount of time that's []* 那甚至不应该编译..但无论如何看起来就像您混淆了贪婪和不贪婪以及方括号的类型,{X}表示之前的组可以有x的时间,而在x的任何时间都不是[ ] *

what you need is something like this 您需要的是这样的东西

/<script type="application\/ld\+json">[^\{]*?{(.*?)\}[^\}]*?<\/script>/s

Use the object index 1 in the match object returned from the preg_match and you will have your JSON. 在从preg_match返回的匹配对象中使用对象索引1,您将获得JSON。

repl.it for a running PHP example(code below): https://repl.it/GNdD/0 一个运行中的PHP示例的repl.it(下面的代码): https ://repl.it/GNdD/0

link to try the regex out: https://regex101.com/r/AouzRm/10 尝试使用正则表达式的链接: https : //regex101.com/r/AouzRm/10

$in = '<script type="application/ld+json">';
$in .= '{"foo" : "bar"}';
$in .= '</script>';

$in .= '<script type="application/ld+json">';
$in .= '    {"foo" : { "bar" : "boo" } }';
$in .= '</script>';

$in .= '<script type="application/ld+json">';
$in .= '    {"foo" : { "bar" : { "boo" : "goo" }}}';
$in .= '</script>';

$matches = [];
$allMatches = [];

preg_match('/<script type="application\/ld\+json">[^\{]*?{(.*?)\}[^\}]*?<\/script>/s',$in,$matches);
preg_match_all('/<script type="application\/ld\+json">[^\{]*?{(.*?)\}[^\}]*?<\/script>/s',$in,$allMatches);

echo "from the preg_match:\n";
print_r("$matches[1]\n\n");

echo "from the preg_match_all:\n";
print_r($allMatches[1]);

As @Denziloe said, your regex looks alright. 正如@Denziloe所说,您的正则表达式看起来还不错。

It might be a problem with the fact you are not accounting for newlines and whitespace within the script tags. 您可能没有考虑脚本标记中的换行符和空格,这可能是一个问题。

Check this example and see if that fixes it, otherwise there is probably something wrong with your implementation. 检查此示例,看看是否可以解决该问题,否则您的实现可能存在问题。 I also think you want to add a capture group like I did to have easier access to the JSON part itself 我还认为您希望像我一样添加捕获组,以更轻松地访问JSON部分本身

<script type="application\\/ld\\+json">\\s*({.*?})\\s*<\\/script> working example <script type="application\\/ld\\+json">\\s*({.*?})\\s*<\\/script> 工作示例

Try using the following regex : 尝试使用以下正则表达式

(?s)>.*?(?={)\K.*?}

see regex demo / explanation 参见 正则表达式演示/说明

PHP ( demo ) PHP演示

$r = '/(?s)>.*?(?={)\K.*?}/';
$s = '<script type="application/ld+json">
    {"foo1" : "bar1"}
</script>

<script type="application/ld+json">
    {"foo2" : "bar2"}
</script>

<script type="application/ld+json">
    {"foo3" : "bar3"}
</script>';
preg_match($r, $s, $o);
print_r($o);

From a PHP standpoint... Maybe you're not accessing $matches correctly? 从PHP的角度来看...也许您没有正确访问$matches Assuming you'd want {"one" : "bar"} from the following example 假设您需要以下示例中的{"one" : "bar"}

<?php

$html = '<script type="application/ld+json">
    {"one" : "bar"}
</script>

<script type="application/ld+json">
    {"two" : "bar"}
</script>

<script type="application/ld+json">
    {"three" : "bar"}
</script>';

$pattern = '/<script type="application\/ld\+json">\s*(\{.*?\})\s*<\/script>/s';

preg_match_all($pattern, $html, $matches);

$whatYouWant = $matches[1][0];

echo $whatYouWant;

You can see the execution of this code here 您可以在此处查看此代码的执行情况

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM