简体   繁体   English

如何从php的字符串中解析出特定的“标签”

[英]How can I parse out specific “tags” from a string in php

I like how StackOverflow allows you to search for tags by specifying [tagname] in the search field. 我喜欢StackOverflow如何允许您通过在搜索字段中指定[tagname]标签名[tagname]来搜索标签。 How could I go about writing a parser that would help me separate out tags from normal text. 我该如何写一个解析器来帮助我将标签与普通文本分开。 I can think of the manual way which would be to use some combination of substring and/or regex to get the position of opening and closing square brackets, and then extract out those strings, but I'm curious if there's a better way (and my regex skill is subpar at best) 我可以想到手动方式,即使用子字符串和/或正则表达式的某种组合来获取打开和关闭方括号的位置,然后提取出这些字符串,但是我很好奇是否有更好的方法(和我的regex技能充其量是低于标准的)

// example
$query = 'How to use [jQuery] [selector] selectors';
$tags = getTags($query); // $tags == 'jQuery, selector'
$text = getText($query); // $text == 'How to use selectors'

Regex would probably work best, just don't try to parse HTML. 正则表达式可能效果最好,只是不要尝试解析HTML。 https://www.debuggex.com/ Is a really good site for visually seeing what your regex string is doing. https://www.debuggex.com/是一个非常不错的网站,可以直观地查看您的正则表达式字符串在做什么。 I would recommend reading up on the PHP regex functions, and learn some more, there is a cheatsheat at the bottom of the site. 我建议阅读PHP regex函数,并了解更多信息,该网站底部有一个骗子。

.*[(tag)].*

Would work to get the tags, using a captured group. 将使用捕获的组来获取标签。 The preg_match_all function is really good for working with multiple results, just make sure to read the official documentation to get it working how you need it. preg_match_all函数对于处理多个结果确实非常有用,只需确保阅读官方文档即可使其按需使用。

For parsing more complex, or irregular things (like html, which is extremely difficult to do reliably), it is better to do it manually. 为了解析更复杂或不规则的事物(例如html,很难可靠地完成),最好手动进行。 Regex has worked for all my non HTML parsing needs in the past. Regex过去曾满足我所有的非HTML解析需求。

Regular Expressions are probably the way to go. 正则表达式可能是解决方法。 The more you can specify about how the tags are set the easier it will be to capture the right ones (In the expression below I limit it to either letters \\w or numbers \\d . The function uses a capture group (enclosed in parens) to pull out the relevant tags. 您可以指定的标签设置方式越多,捕获正确的标签就越容易(在下面的表达式中,我将其限制为字母\\w或数字\\d 。该函数使用捕获组(括在括号中)拔出相关标签。

function getTags($query) {
    preg_match_all("/\[([\w\d]+)\]/", $query, $matches);
    return $matches;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM