[英]PHP Regex for splitting a string at the first occurrence of a character
This may be a lame question but I am a total novice with regular expressions. 这可能是一个la脚的问题,但是我是一个有正则表达式的新手。 I have some text data in the format: 我有一些格式的文本数据:
Company Name: Name of the company, place. 公司名称:公司名称,地点。
Company Address: Some, address, here. 公司地址:一些,地址,在这里。
Link: http://www.somelink.com 链接: http : //www.somelink.com
Now, I want to use a regex to split these into an array of name : value pairs. 现在,我想使用正则表达式将它们拆分为name:value对的数组。 The regular expression I am trying is /(.*):(.*)/
with preg_match_all()
and it does work well with the first two lines but on the third line it returns "Link: http:" in one part and "//www.somelink.com" in other. 我尝试使用的正则表达式是/(.*):(.*)/
与preg_match_all()
,它与前两行效果很好,但在第三行中它部分返回“ Link:http:”,而“ //www.somelink.com”中的其他内容。
So, is there any way to split the line only at the first occurrence of the character ':'? 因此,有什么方法仅在字符':'首次出现时才拆分行吗?
Use negated character class ( see on rubular.com ): 使用否定的字符类( 请参见rubular.com ):
/^([^:]*):(.*)$/m
The […]
is a character class . […]
是一个字符类 。 Something like [aeiou]
matches one of any of the lowercase vowels. 像[aeiou]
类的东西与任何小写元音之一匹配。 [^…]
is a negated character class. [^…]
是一个否定的字符类。 [^aeiou]
matches one of anything but the lowercase vowels. [^aeiou]
匹配小写元音以外的任何一个。
The ^
and $
at the beginning and end of the pattern are the beginning and end of the line anchors . 模式开头和结尾的^
和$
是行锚的开头和结尾。 The m
modifiers turns on the multi-line mode . m
修饰符打开多行模式 。
The problem with your original pattern is that you're (ab)using .
原始模式的问题是您正在(滥用) .
when you could've been a lot more specific, and since *
is greedy, the first group overmatched. 当您本可以更加具体一些时,并且由于*
是贪婪的,因此第一组过匹配。 It's tempting to try to "fix" that by making the repetition reluctant, but it's MUCH better to be more specific and say that the first group is matching anything but :
. 这是很有诱惑力的尝试“修复”,通过使重复舍不得,但是这是更好的更具体的说,第一组是匹配什么,但:
。
Note however that this is a matching pattern, with captures. 但是请注意,这是带有捕获的匹配模式。 It's not actually a splitting pattern that matches only the delimiter. 它实际上不是仅与分隔符匹配的拆分模式。 The delimiter pattern really is just :
. 分隔符模式确实是:
。
Given this: 鉴于这种:
$text = <<<EOT
Company Name: Name of the company, place.
Company Address: Some, address, here.
Link: http://www.somelink.com
EOT;
preg_match_all('/^([^:]*):(.*)$/m', $text, $matches, PREG_SET_ORDER);
print_r($matches);
The output is ( as seen on ideone.com ): 输出为( 如ideone.com上所示 ):
Array
(
[0] => Array
(
[0] => Company Name: Name of the company, place.
[1] => Company Name
[2] => Name of the company, place.
)
[1] => Array
(
[0] => Company Address: Some, address, here.
[1] => Company Address
[2] => Some, address, here.
)
[2] => Array
(
[0] => Link: http://www.somelink.com
[1] => Link
[2] => http://www.somelink.com
)
)
You probably want something like /(.*?):(.*)/
. 您可能想要类似/(.*?):(.*)/
。 The ?
?
after the *
will make it "non-greedy", so it will consume as little text as possible that way. *
将使其变为“非贪婪”,因此它将以这种方式消耗尽可能少的文本。 I think that will work for your situation. 我认为这将适合您的情况。 By default, *
is "greedy", and tries to match as many repetitions as it can. 默认情况下, *
为“贪心”,并尝试匹配尽可能多的重复项。
Edit: See here for more about matching repetition using the *
and +
operators. 编辑:有关使用*
和+
运算符进行重复匹配的更多信息,请参见此处 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.