简体   繁体   English

正则表达式从查询字符串的第一个大写到句子结尾匹配

[英]Regex to match from first uppercase to end of sentence of querystring

I need to find a sentence or sentences that is/are surrounding a string. 我需要找到一个或多个字符串的句子。 That will be from the first capital letter or break line to the end point or break line. 这将是从第一个大写字母或中断线到终点或中断线。

What I got is this but of course is not working at all: 我得到的是这个但当然不起作用:

$search_string='example';

$regex = '\[A-Z]{1}[a-z]*\s*'.$search_string.'\s*[a-zA-Z]*\i';

preg_match_all($regex, $content, $matches);  

If the word is repeated in more than on sentence i will need to retrieve both sentences. 如果单词重复多于句子,我将需要检索两个句子。 I'm not sure if im explaining it well; 我不确定我是否能很好地解释它; please comment and I will try to explain it again. 请评论,我会再次尝试解释。


EDIT 编辑

I have a wordpress website with lot of post and pdf, docs, etc inside those post. 我有一个wordpress网站,里面有很多帖子和pdf,docs等。 Im using a searchengine called swish-e to index all and display results. 我使用一个名为swish-e的搜索引号来索引所有并显示结果。 When someone search for any string i want to display a summary of that string instead of the full post/ or pdf. 当有人搜索任何字符串时,我想显示该字符串的摘要而不是完整的帖子/或pdf。

So if a user searchs "example" string, i need to show all the sentences or at least a few of them where the word example appears. 因此,如果用户搜索“示例”字符串,我需要显示所有句子或至少一些单词示例出现的句子。 That´s why i asked for a capital letter at beggining and the end point at the end. 这就是为什么我在开始时要求大写字母和结束时的终点。 I know this wont be perfect but at least i need to cover some scenarios (Capital letter / break lines, etc) 我知道这不会是完美的,但至少我需要涵盖一些场景(大写字母/断线等)

Hope its more clear, once again thanks a lot 希望它更清楚,再次感谢很多

Your search_string should be preg_quote'd, or users can manipulate the results with special characters like | 您的search_string应该是preg_quote'd,或者用户可以使用|等特殊字符来操作结果

$search_string='example';
$regex = '/[A-Z][a-z ]*\b'.preg_quote($search_string,"/").'\b.*?(?:[.!?]|$)/i';
preg_match_all($regex, $content, $matches);  

I've assumed the sentence can be terminated by . 我假设判决可以终止。 or ? 要么 ? or ! 要么 !

You probably don't want to use \\ characters for your pattern delimiters - if it works at all, it is likely to give odd behaviour. 您可能不希望为模式分隔符使用\\字符 - 如果它完全起作用,则可能会产生奇怪的行为。 You also have the i pattern modifier applied to your pattern, so [az] will also match capital letters, and [AZ] will match lower case chars. 您还可以将i模式修改器应用于您的模式,因此[az]也将匹配大写字母,[AZ]将匹配小写字母。

Edit: 编辑:

This solution is more flexible, though it doesn't require the sentence to start with a capital letter. 这个解决方案更灵活,但它不要求句子以大写字母开头。 Up to you if you want to use it: 如果您想使用它,由您决定:

$search_string='example';
$regex = '/[^.!?\n]*\b'.preg_quote($search_string,"/").'\b[^.!?\n]*/i';
preg_match_all($regex, $content, $matches);  

How about: 怎么样:

$search=preg_quote('example');

$regex = '/[A-Z][^\.]+\s+'.$search.'\s[^\.]+/i';

preg_match_all($regex, $content, $matches);  

Basically: 基本上:

  • Capital letter 大写字母
  • One or more of anything that isn't a . 一个或多个不是的东西.
  • One or more spaces 一个或多个空格
  • Your pattern 你的模式
  • One or more of anything that's not a dot. 一个或多个不是点的东西。

Should match the sentence excluding the trailing . 应该匹配不包括尾随的句子.


This is a more complete solution that (checked and working) handles the 'over to the next line' issue, as well as words surrounded by quotes: 这是一个更完整的解决方案(检查和工作)处理'到下一行'的问题,以及被引号括起来的单词:

$content = "Sentence one. This is an example sentence. Sentence two. Sentence with the word 'example' in it\nthat goes over multiple lines. this isn't starting with a capital letter, for example.";
$search=preg_quote('example');
$regex = '/[A-Z][^\.\n]+\W'.$search.'\W[^\.\n]+/';

preg_match_all($regex, $content, $matches);  
print_r($matches);

Prints: 打印:

Array
(
    [0] => Array
        (
            [0] => This is an example sentence
            [1] => Sentence with the word 'example' in it
        )
)

This regex will do what you want: 这个正则表达式会做你想要的:

$regex = '/[A-Z\n]{1}([a-z]*?\s*)+'.$search_string.'(\s*?[a-zA-Z]*)+[\.\n]/';

and here you can see how it works: 在这里你可以看到它是如何工作的:

http://ideone.com/aCJJZ http://ideone.com/aCJJZ

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM