简体   繁体   中英

Regex to match from first uppercase to end of sentence of querystring

I need to find a sentence or sentences that is/are surrounding a string. That will be from the first capital letter or break line to the end point or break line.

What I got is this but of course is not working at all:

$search_string='example';

$regex = '\[A-Z]{1}[a-z]*\s*'.$search_string.'\s*[a-zA-Z]*\i';

preg_match_all($regex, $content, $matches);  

If the word is repeated in more than on sentence i will need to retrieve both sentences. I'm not sure if im explaining it well; please comment and I will try to explain it again.


EDIT

I have a wordpress website with lot of post and pdf, docs, etc inside those post. Im using a searchengine called swish-e to index all and display results. When someone search for any string i want to display a summary of that string instead of the full post/ or pdf.

So if a user searchs "example" string, i need to show all the sentences or at least a few of them where the word example appears. That´s why i asked for a capital letter at beggining and the end point at the end. I know this wont be perfect but at least i need to cover some scenarios (Capital letter / break lines, etc)

Hope its more clear, once again thanks a lot

Your search_string should be preg_quote'd, or users can manipulate the results with special characters like |

$search_string='example';
$regex = '/[A-Z][a-z ]*\b'.preg_quote($search_string,"/").'\b.*?(?:[.!?]|$)/i';
preg_match_all($regex, $content, $matches);  

I've assumed the sentence can be terminated by . or ? or !

You probably don't want to use \\ characters for your pattern delimiters - if it works at all, it is likely to give odd behaviour. You also have the i pattern modifier applied to your pattern, so [az] will also match capital letters, and [AZ] will match lower case chars.

Edit:

This solution is more flexible, though it doesn't require the sentence to start with a capital letter. Up to you if you want to use it:

$search_string='example';
$regex = '/[^.!?\n]*\b'.preg_quote($search_string,"/").'\b[^.!?\n]*/i';
preg_match_all($regex, $content, $matches);  

How about:

$search=preg_quote('example');

$regex = '/[A-Z][^\.]+\s+'.$search.'\s[^\.]+/i';

preg_match_all($regex, $content, $matches);  

Basically:

  • Capital letter
  • One or more of anything that isn't a .
  • One or more spaces
  • Your pattern
  • One or more of anything that's not a dot.

Should match the sentence excluding the trailing .


This is a more complete solution that (checked and working) handles the 'over to the next line' issue, as well as words surrounded by quotes:

$content = "Sentence one. This is an example sentence. Sentence two. Sentence with the word 'example' in it\nthat goes over multiple lines. this isn't starting with a capital letter, for example.";
$search=preg_quote('example');
$regex = '/[A-Z][^\.\n]+\W'.$search.'\W[^\.\n]+/';

preg_match_all($regex, $content, $matches);  
print_r($matches);

Prints:

Array
(
    [0] => Array
        (
            [0] => This is an example sentence
            [1] => Sentence with the word 'example' in it
        )
)

This regex will do what you want:

$regex = '/[A-Z\n]{1}([a-z]*?\s*)+'.$search_string.'(\s*?[a-zA-Z]*)+[\.\n]/';

and here you can see how it works:

http://ideone.com/aCJJZ

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM