正则表达式替换字符串的字符以字符开头并以两个字符中的任何一个结尾

Question

Trying to match string that starts with #1-9 note: # is followed by a number from 1 to 9 and ends with #1-9 (or not) .尝试匹配以#1-9开头的字符串注意： #后跟一个从1 to 9的数字，并以#1-9 (或 not)结尾。

Full string: "#1Lorem Ipsum is simply dummy text#2printing and typesetting industry"完整字符串： "#1Lorem Ipsum is simply dummy text#2printing and typesetting industry"

Idea:主意：

is to replace #1Lorem Ipsum is simply dummy text with Lorem Ipsum is simply dummy text是用Lorem Ipsum is simply dummy text替换#1Lorem Ipsum is simply dummy text

and #2printing and typesetting industry with printing and typesetting industry和#2printing and typesetting industry与printing and typesetting industry

so to replace #1-9 with  and append the ending tag  at the end of each.所以用和 append 替换#1-9结束标记在每个末尾。

but:但：

let's say if the string has only one string starting with #1-9 like that:假设字符串只有一个以#1-9开头的字符串，如下所示：

"#1Lorem Ipsum is simply dummy text" how could be putting  at the end to close the  tag. "#1Lorem Ipsum is simply dummy text"如何将放在末尾以关闭标记。

i'm guessing maybe using the last " at the end of words to prepend the closing  tag before it, since no more #1-9 to stop before it, but without losing or replacing the last " of the string.我猜可能会使用单词末尾的最后一个"在它之前添加结束标记，因为不再有#1-9在它之前停止，但不会丢失或替换字符串的最后一个" 。

so it becomes: "Lorem Ipsum is simply dummy text"所以它变成： "Lorem Ipsum is simply dummy text"

Regex i've tried: (#[0-9])(.*?)(#|") but this is only matching the first part #1 of the string and ignoring the #2 part (see full string) .我尝试过的正则表达式： (#[0-9])(.*?)(#|")但这仅匹配字符串的第一部分#1并忽略#2部分（参见完整字符串） 。

I will be using php to match and replace maybe using preg_replace just need to find a way to the regex part first.我将使用php来匹配和替换，也许使用preg_replace只需要先找到正则表达式部分的方法。

How can i achieve this?我怎样才能做到这一点？

Answer 1

What you are looking for is a negative look-ahead .您正在寻找的是负面的前瞻。 It's very powerful and will only match if the match inside does not match.它非常强大，只有当里面的匹配不匹配时才会匹配。

#([0-9])((?:(?!$|#[0-9]).)+)

This will look for #0-9 and end if another #0-9 occurs, or end of line.这将查找 #0-9 并在另一个 #0-9 出现或行尾时结束。 The negative look-ahead bit is this: (?!$|#[0-9]) .消极的前瞻位是这样的： (?!$|#[0-9]) 。 It says only continue if it cannot match $ or #0-9.它说只有在它不能匹配 $ 或 #0-9 时才继续。 You have to process it for every character, so when you don't match it, match the next character with .您必须为每个字符处理它，因此当您不匹配它时，将下一个字符与. , and match it all in a capture group. ，并将其全部匹配到一个捕获组中。

Here's the railroad diagram:铁路图如下：

Which was generated using regexper.com这是使用regexper.com生成的

Answer 2

<?php
function convert($str) {
    static $numberNamesMap = [
        1 => 'one',
        2 => 'two',
        3 => 'three',
        4 => 'four',
        5 => 'five',
        6 => 'six',
        7 => 'seven',
        8 => 'eight',
        9 => 'nine',
    ];
    return preg_replace_callback(
        '~#([1-9])(((?!#[1-9]).)*)~',
        function($matches) use ($numberNamesMap) {
            $class = $numberNamesMap[$matches[1]];
            $htmlText = htmlentities($matches[2]);
            return "<span class=\"$class\">$htmlText</span>";
        },
        $str
    ); 
}

References参考

Examples例子

echo convert('#1Lorem Ipsum is simply dummy text');

outputs:输出：

<span class="one">Lorem Ipsum is simply dummy text</span>

echo convert('#1Lorem Ipsum is simply dummy text#2printing and typesetting industry');

outputs:输出：

<span class="one">Lorem Ipsum is simply dummy text</span><span class="two">printing and typesetting industry</span>

echo convert('#1Lorem Ipsum is simply dummy text#0printing and typesetting industry');

outputs:输出：

<span class="one">Lorem Ipsum is simply dummy text#0printing and typesetting industry</span>

Answer 3

preg_replace_callback() is the right tool for this job. preg_replace_callback()是完成这项工作的正确工具。 To avoid needing to manually declare a number mapping array, you can use the NumberFormatter class.为避免需要手动声明数字映射数组，您可以使用NumberFormatter class。 Using sprintf() in the callback body will help to separate data from the html and make maintenance easier.在回调主体中使用sprintf()将有助于将数据从 html 中分离出来，并使维护更容易。

Code: ( Demo )代码：（演示）

$string = '#1Lorem Ipsum is simply dummy text#2printing and typesetting industry#0nothing#35That\'s a big one!';

echo preg_replace_callback(
         '/#(\d+)((?:(?!#\d).)+)/',
         fn($m) => sprintf(
             '<span class="%s">%s</span>',
             (new NumberFormatter("en", NumberFormatter::SPELLOUT))->format($m[1]),
             htmlentities($m[2])
         ),
         $string
     );

Output: Output：

<span class="one">Lorem Ipsum is simply dummy text</span><span class="two">printing and typesetting industry</span><span class="zero">nothing</span><span class="thirty-five">That&#039;s a big one!</span>

Note that if your actual strings after the #[number] NEVER have # symbols in it you can DRAMATICALLY improve the regex performance by using a greedy negated character class as the second capture group.请注意，如果您在#[number]之后的实际字符串中没有#符号，则可以通过使用贪婪的否定字符 class 作为第二个捕获组来显着提高正则表达式的性能。 #(\d+)([^#]+) This reduces the step count from 283 steps to just 16 steps on your sample string. #(\d+)([^#]+)这将样本字符串上的步数从 283 步减少到仅 16 步。

To be perfectly honest, even a lazy pattern like #(\d+)(.+?(?=#\d|$)) will process the sample string in 213 steps.老实说，即使是像#(\d+)(.+?(?=#\d|$))这样的惰性模式也会以 213 个步骤处理样本字符串。 Performance might not be a factor, so use whatever regex you are most comfortable reading.性能可能不是一个因素，因此请使用您最喜欢阅读的任何正则表达式。

正则表达式替换字符串的字符以字符开头并以两个字符中的任何一个结尾

问题描述

3 个解决方案

解决方案1
3 2022-08-02 22:02:20

解决方案2
1 已采纳 2022-08-02 22:24:15

References参考

Examples例子

解决方案3
1 2022-08-03 04:35:16

正则表达式替换字符串的字符以字符开头并以两个字符中的任何一个结尾

问题描述

3 个解决方案

解决方案1 3 2022-08-02 22:02:20

解决方案2 1 已采纳 2022-08-02 22:24:15

References参考

Examples例子

解决方案3 1 2022-08-03 04:35:16

解决方案1
3 2022-08-02 22:02:20

解决方案2
1 已采纳 2022-08-02 22:24:15

解决方案3
1 2022-08-03 04:35:16