[英]PHP Regex Help for parsing String
I have a string such as the following: 我有一个如下字符串:
Are you looking for a quality real estate company?
<s>Josh's real estate firm specializes in helping people find homes from
[city][State].</s>
<s>Josh's real estate company is a boutique real estate firm serving clients
locally.</s>
In [city][state] I am sure you know how difficult it is
to find a great home, but we work closely with you to give you exactly
what you need
I would like to have this paragraph split into an array based on the <s> </s>
tags, so I have the following array as the result: 我想将此段拆分为基于
<s> </s>
标记的数组,因此我将以下数组作为结果:
[0] Are you looking for a quality real estate company?
[1] Josh's real estate firm
specializes in helping people find homes from [city][State].
[2] Josh's real estate company is a boutique real estate firm serving clients
locally.
[3] In [city][state] I am sure you know how difficult it is
to find a great home, but we work closely with you to give you exactly
what you need
This is a regex i'm currently using: 这是我正在使用的正则表达式:
$matches = array();
preg_match_all(":<s>(.*?)</s>:is", $string, $matches);
$result = $matches[1];
print_r($result);
But this one only returns an array containing the text found between <s> </s>
tags, it ignores the text found before and after these tags. 但是这个只返回一个包含在
<s> </s>
标记之间找到的文本的数组,它忽略了在这些标记之前和之后找到的文本。 (In the example above it would only return the array elements 1 and 2. (在上面的例子中,它只返回数组元素1和2。
Any ideas? 有任何想法吗?
The closest I could get was using preg_split()
instead: 我能得到的最接近的是使用
preg_split()
代替:
$string = <<< STR
Are you looking for a quality real estate company? <s>Josh's real estate firm
specializes in helping people find homes from [city][State].</s>
<s>Josh's real estate company is a boutique real estate firm serving clients
locally.</s> In [city][state] I am sure you know how difficult it is
to find a great home, but we work closely with you to give you exactly
what you need
STR;
print_r(preg_split(':</?s>:is', $string));
And got this output: 得到了这个输出:
Array
(
[0] => Are you looking for a quality real estate company?
[1] => Josh's real estate firm
specializes in helping people find homes from [city][State].
[2] =>
[3] => Josh's real estate company is a boutique real estate firm serving clients
locally.
[4] => In [city][state] I am sure you know how difficult it is
to find a great home, but we work closely with you to give you exactly
what you need
)
Except that produces an extra array element (index 2
) where there's a newline between the fragments [city][State].</s>
and <s>Josh's real estate company
. 除了产生一个额外的数组元素(索引
2
),其中片段[city][State].</s>
和<s>Josh's real estate company
之间有换行符。
It'd be trivial to add some code to remove the whitespace matches though, but I'm not sure if you desire that. 添加一些代码来删除空格匹配是微不足道的,但我不确定你是否愿意。
我建议你研究一下DOM http://php.net/manual/en/book.dom.php
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.