PHP Regex用于在首次出现字符时拆分字符串

Question

This may be a lame question but I am a total novice with regular expressions. 这可能是一个la脚的问题，但是我是一个有正则表达式的新手。 I have some text data in the format: 我有一些格式的文本数据：

Company Name: Name of the company, place. 公司名称：公司名称，地点。
Company Address: Some, address, here. 公司地址：一些，地址，在这里。
Link: http://www.somelink.com 链接： http ： //www.somelink.com

Now, I want to use a regex to split these into an array of name : value pairs. 现在，我想使用正则表达式将它们拆分为name：value对的数组。 The regular expression I am trying is /(.*):(.*)/ with preg_match_all() and it does work well with the first two lines but on the third line it returns "Link: http:" in one part and "//www.somelink.com" in other. 我尝试使用的正则表达式是/(.*):(.*)/与preg_match_all() ，它与前两行效果很好，但在第三行中它部分返回“ Link：http：”，而“ //www.somelink.com”中的其他内容。

So, is there any way to split the line only at the first occurrence of the character ':'? 因此，有什么方法仅在字符'：'首次出现时才拆分行吗？

Answer 1

Use negated character class ( see on rubular.com ): 使用否定的字符类（请参见rubular.com ）：

/^([^:]*):(.*)$/m

The […] is a character class . […]是一个字符类 。 Something like [aeiou] matches one of any of the lowercase vowels. 像[aeiou]类的东西与任何小写元音之一匹配。 [^…] is a negated character class. [^…]是一个否定的字符类。 [^aeiou] matches one of anything but the lowercase vowels. [^aeiou]匹配小写元音以外的任何一个。

The ^ and $ at the beginning and end of the pattern are the beginning and end of the line anchors . 模式开头和结尾的^和$是行锚的开头和结尾。 The m modifiers turns on the multi-line mode . m修饰符打开多行模式。

The problem with your original pattern is that you're (ab)using . 原始模式的问题是您正在（滥用） . when you could've been a lot more specific, and since * is greedy, the first group overmatched. 当您本可以更加具体一些时，并且由于*是贪婪的，因此第一组过匹配。 It's tempting to try to "fix" that by making the repetition reluctant, but it's MUCH better to be more specific and say that the first group is matching anything but : . 这是很有诱惑力的尝试“修复”，通过使重复舍不得，但是这是更好的更具体的说，第一组是匹配什么，但: 。

Note however that this is a matching pattern, with captures. 但是请注意，这是带有捕获的匹配模式。 It's not actually a splitting pattern that matches only the delimiter. 它实际上不是仅与分隔符匹配的拆分模式。 The delimiter pattern really is just : . 分隔符模式确实是: 。

PHP snippet PHP片段

Given this: 鉴于这种：

$text = <<<EOT
Company Name: Name of the company, place.
Company Address: Some, address, here.
Link: http://www.somelink.com
EOT;

preg_match_all('/^([^:]*):(.*)$/m', $text, $matches, PREG_SET_ORDER);

print_r($matches);

The output is ( as seen on ideone.com ): 输出为（如ideone.com上所示）：

Array
(
    [0] => Array
        (
            [0] => Company Name: Name of the company, place.
            [1] => Company Name
            [2] =>  Name of the company, place.
        )

    [1] => Array
        (
            [0] => Company Address: Some, address, here.
            [1] => Company Address
            [2] =>  Some, address, here.
        )

    [2] => Array
        (
            [0] => Link: http://www.somelink.com
            [1] => Link
            [2] =>  http://www.somelink.com
        )

)

Answer 2

You probably want something like /(.*?):(.*)/ . 您可能想要类似/(.*?):(.*)/ 。 The ? ? after the * will make it "non-greedy", so it will consume as little text as possible that way. *将使其变为“非贪婪”，因此它将以这种方式消耗尽可能少的文本。 I think that will work for your situation. 我认为这将适合您的情况。 By default, * is "greedy", and tries to match as many repetitions as it can. 默认情况下， *为“贪心”，并尝试匹配尽可能多的重复项。

Edit: See here for more about matching repetition using the * and + operators. 编辑：有关使用*和+运算符进行重复匹配的更多信息，请参见此处。

PHP Regex用于在首次出现字符时拆分字符串

问题描述

2 个解决方案

解决方案1
1 2010-08-13 18:57:54

Related questions 相关问题

PHP snippet PHP片段

解决方案2
0 已采纳 2010-08-13 18:52:06

PHP Regex用于在首次出现字符时拆分字符串

问题描述

2 个解决方案

解决方案1 1 2010-08-13 18:57:54

Related questions 相关问题

PHP snippet PHP片段

解决方案2 0 已采纳 2010-08-13 18:52:06

解决方案1
1 2010-08-13 18:57:54

解决方案2
0 已采纳 2010-08-13 18:52:06