简体   繁体   English

正则表达式,用于匹配封装在%中的令牌

[英]Regex for matching token wrapped in %

I have user-entered text with potentially mistyped "tokens" I'm trying to find using PHP . 我尝试使用PHP查找用户输入的文本,该文本可能带有错误键入的“令牌”。

A valid "token" is any number of word characters wrapped in percent signs - so %blah% %blah_moreblah% . 有效的“令牌”是用百分号包装的任意数量的单词字符-因此%blah% %blah_moreblah% Basically I'm looking for tokens where the user may have forgotten to put a leading or trailing '%'. 基本上,我正在寻找用户可能忘记输入前导或尾随'%'的令牌。 I'm also looking for tokens in the valid format - as at this point in my code, all replaceable tokens have already been replaced. 我还在寻找有效格式的令牌-到目前为止,在我的代码中,所有可替换令牌已被替换。

So, the 3 situations I'm looking for are (to borrow regex syntax): %\\w+ , %\\w+% , or \\w+% . 因此,我正在寻找3种情况(借用正则表达式语法): %\\w+%\\w+%\\w+%

In English, what I'm looking for is, "a string that starts with a % and/or ends with a % and contains only word characters' 用英语来说,我要寻找的是“一个以%开头和/或以%结尾并且仅包含单词字符的字符串”

The regex I have this far is: (%*\\w+%*) , but you'll notice it matches every single word. 到目前为止,我拥有的正则表达式是: (%*\\w+%*) ,但是您会注意到它与每个单词都匹配。 I'm stuck on making a match require at least a leading or a trailing %. 我坚持要求比赛至少需要前导或尾随的%。

Edit : Initially I tried to have all 3 situations found with their own regex. 编辑 :最初,我尝试使用自己的正则表达式查找所有3种情况。 However, I was finding that the regex for finding tokens in the first situation would also find tokens in the second situation, just without the trailing %. 但是,我发现在第一种情况下用于查找令牌的正则表达式也将在第二种情况下查找令牌,只是没有尾随的%。 For example, /(%\\w+)/ , when checked against %before %both% , would match %before and %both . 例如, /(%\\w+)/%before %both% beth %before %both% be检查时,将与%before%both匹配。

To match tokens enclosed with % , or having % on either side, use 要匹配用%括起来的令牌,或者在任一侧都有%令牌,请使用

(?=\w*%)%*\w+%*

See another regex demo . 参见另一个正则表达式演示

This is your pattern that I added a positive lookahead to. 我向添加了积极的前瞻性,这是您的模式。 The (?=\\w*%) restricts to only such matches where a % appears after a zero or more occurrences of word characters. (?=\\w*%)仅限于这样的匹配,即在出现零次或多次单词字符后出现%情况。

Note also that %* will match zero or more percent signs, it may match %%%word%% . 另请注意, %*将匹配零个或多个百分号,可能匹配%%%word%% If it is not what you need, and if you need to match 1 or 0 % s, just replace the * with ? 如果不是您所需要的,并且需要匹配1或0 % s,则只需将*替换为? quantifier. 量词。

Try this: 尝试这个:

$input_lines = "Hello this is a %string% with %some_words in it just for demo% purposes.";

preg_match_all("/\s[\w_\-]+%\.?|%[\w_\-]+(%|\s|\.)/", $input_lines, $output_array);

That will output this: 这将输出:

array(
    0   =>  %string%
    1   =>  %some_words 
    2   =>   demo%
)

Note that this will catch the valid cases, as well as the typos you are looking for. 请注意,这将捕获有效的个案以及您要查找的错别字。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM