简体   繁体   English

PHP Regex帮助解析String

[英]PHP Regex Help for parsing String

I have a string such as the following: 我有一个如下字符串:

Are you looking for a quality real estate company? 

<s>Josh's real estate firm specializes in helping people find homes from          
[city][State].</s>

<s>Josh's real estate company is a boutique real estate firm serving clients 
locally.</s> 

In [city][state] I am sure you know how difficult it is
to find a great home, but we work closely with you to give you exactly 
what you need

I would like to have this paragraph split into an array based on the <s> </s> tags, so I have the following array as the result: 我想将此段拆分为基于<s> </s>标记的数组,因此我将以下数组作为结果:

[0] Are you looking for a quality real estate company?
[1] Josh's real estate firm 
    specializes in helping people find homes from [city][State].
[2] Josh's real estate company is a boutique real estate firm serving clients 
    locally.
[3] In [city][state] I am sure you know how difficult it is
    to find a great home, but we work closely with you to give you exactly 
    what you need

This is a regex i'm currently using: 这是我正在使用的正则表达式:

$matches = array();
preg_match_all(":<s>(.*?)</s>:is", $string, $matches);
$result = $matches[1];
print_r($result);

But this one only returns an array containing the text found between <s> </s> tags, it ignores the text found before and after these tags. 但是这个只返回一个包含在<s> </s>标记之间找到的文本的数组,它忽略了在这些标记之前和之后找到的文本。 (In the example above it would only return the array elements 1 and 2. (在上面的例子中,它只返回数组元素1和2。

Any ideas? 有任何想法吗?

The closest I could get was using preg_split() instead: 我能得到的最接近的是使用preg_split()代替:

$string = <<< STR
Are you looking for a quality real estate company? <s>Josh's real estate firm 
specializes in helping people find homes from [city][State].</s>
<s>Josh's real estate company is a boutique real estate firm serving clients 
locally.</s> In [city][state] I am sure you know how difficult it is
to find a great home, but we work closely with you to give you exactly 
what you need
STR;

print_r(preg_split(':</?s>:is', $string));

And got this output: 得到了这个输出:

Array
(
    [0] => Are you looking for a quality real estate company? 
    [1] => Josh's real estate firm 
specializes in helping people find homes from [city][State].
    [2] => 

    [3] => Josh's real estate company is a boutique real estate firm serving clients 
locally.
    [4] =>  In [city][state] I am sure you know how difficult it is
to find a great home, but we work closely with you to give you exactly 
what you need
)

Except that produces an extra array element (index 2 ) where there's a newline between the fragments [city][State].</s> and <s>Josh's real estate company . 除了产生一个额外的数组元素(索引2 ),其中片段[city][State].</s><s>Josh's real estate company之间有换行符。

It'd be trivial to add some code to remove the whitespace matches though, but I'm not sure if you desire that. 添加一些代码来删除空格匹配是微不足道的,但我不确定你是否愿意。

我建议你研究一下DOM http://php.net/manual/en/book.dom.php

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM