一个正则表达式，用于提取两个标签之间的文本，还包含标签名称

Question

I need a simple markup language to store different parts of a string on a TEXT field, and then extract those parts. 我需要一种简单的标记语言来将字符串的不同部分存储在TEXT字段中，然后提取这些部分。 So basically I want some kind of simple XML. 所以基本上我想要某种简单的XML。 Storing those in the table field is easy, but extracting them... is other matter. 将它们存储在表字段中很容易，但是提取它们...是另一回事。 I managed to do so using a simple regex done for regular HTML: 我设法使用为常规HTML完成的简单正则表达式来做到这一点：

|<[^>]+>(.*)</[^>]+>|U

But in order to re-compose the original array (and use the markup more generally) I need also to know the tag names. 但是，为了重新组成原始数组（并更广泛地使用标记），我还需要知道标签名称。 And that regex does't do that. 那个正则表达式不会那样做。

Examples: 例子：

Input text: 输入文本：

<user_input>Hello! my name is Williams</user_input>

The preg_match_all() function using the above regex returns: 使用上述正则表达式的preg_match_all（）函数返回：

array
  0 => 
    array
      0 => string '<user_input>Hello! my name is Williams</user_input>' (length=34)

  1 => 
    array
      0 => string 'Hello! my name is Williams' (length=34)

I need it to return the "user_input" name of the tag. 我需要它来返回标签的“ user_input”名称。 Yes, I know, I suck on regex. 是的，我知道，我很喜欢正则表达式。 Yes, I know "use a XML parser", but that is too big for what I'am doing. 是的，我知道“使用XML解析器”，但这对于我正在做的事情来说太大了。

Answer 1

How is a xml parser "too big"? xml解析器如何“太大”？ PHP has built-in native functions that allow you to do it easily. PHP具有内置的本机函数，可让您轻松地做到这一点。

Regex doesn't fit the job. 正则表达式不适合这份工作。

<?php

$string = '
<root>
<input_name>blah</input_name>
</root>
';

$x = new DOMDocument();
$x->loadXML($string);
$root = $x->documentElement;
$elements = $root->getElementsByTagName('*');
$count = count($elements->length);

for ( $i = 0; $i< $count; $i++ ) {
    $el = $elements->item($i);
    echo $el->nodeName . '<br>';
    echo $el->nodeValue . '<br>';
}

Answer 2

So basically I want some kind of simple XML 所以基本上我想要某种简单的XML

Then you want an XML parser. 然后，您需要一个XML解析器。 And hey, PHP has an XML parsing extension you can install. 嘿，PHP具有可以安装的XML解析扩展。

Seriously, trying to hack your way there with regexes is only going to end in pain and frustration. 认真地讲，尝试使用正则表达式来破解自己的方式只会以痛苦和无奈而告终。 Use an XML parser, and save yourself hours of work. 使用XML解析器，节省您的工作时间。

but that is too big for what I'am doing. 但这对于我正在做的事情来说太大了。

No, it's not. 不，这不对。 You're wanting to parse something - hence, you should use a parser. 您想解析某些内容-因此，您应该使用解析器。

Answer 3

|<([^>]+)>(.*)</[^>]+>|U

Will do what you want. 会做你想做的。 I merely added two parenthesis. 我仅添加了两个括号。 It is a very brittle hack. 这是一个非常脆弱的技巧。 You want to use a parser. 您要使用解析器。 Especially as you apparently don't understand regexps. 尤其是您显然不了解正则表达式。

Answer 4

Just use a capturing group like you did with the content: 就像使用内容一样使用捕获组：

|<([^>]+)>([^<]*)</\1>|

As an added bonus, you can use the captured name to make sure the closing tag has the same name. 另外，您可以使用捕获的名称来确保结束标记具有相同的名称。

一个正则表达式，用于提取两个标签之间的文本，还包含标签名称

问题描述

4 个解决方案

解决方案1
6 2010-07-09 01:46:29

解决方案2
1 2010-07-09 01:46:17

解决方案3
0 2010-07-09 01:48:27

解决方案4
0 已采纳 2010-07-09 01:56:18

一个正则表达式，用于提取两个标签之间的文本，还包含标签名称

问题描述

4 个解决方案

解决方案1 6 2010-07-09 01:46:29

解决方案2 1 2010-07-09 01:46:17

解决方案3 0 2010-07-09 01:48:27

解决方案4 0 已采纳 2010-07-09 01:56:18

解决方案1
6 2010-07-09 01:46:29

解决方案2
1 2010-07-09 01:46:17

解决方案3
0 2010-07-09 01:48:27

解决方案4
0 已采纳 2010-07-09 01:56:18