[英]parse search string
I have search strings, similar to the one bellow: 我有搜索字符串,类似于以下所示:
energy food "olympics 2010" Terrorism OR "government" OR cups NOT transport
and I need to parse it with PHP5 to detect if the content belongs to any of the following clusters: 并且我需要使用PHP5进行解析,以检测内容是否属于以下任何集群:
These are the rules i have set: 这些是我设定的规则:
So the end result should be something similar to: 因此,最终结果应类似于以下内容:
AllWords: (energy, food, "olympics 2010")
AnyWords: (terrorism, "government", cups)
NotWords: (Transport)
What would be a good way to do this? 什么是做到这一点的好方法?
If you want to do this with Regex, be aware that your parsing will break on stupid user input (the user, not the input =) ). 如果要使用Regex进行此操作,请注意,您的解析将在愚蠢的用户输入(用户,而不是input =)上中断。
I'd try the following Regexes. 我会尝试以下正则表达式。
NotWords: 非字词:
(?<=NOT\s)\b((?!NOT|OR)\w+|"[^"]+")\b
AllWords: AllWords:
(?<!OR\s)\b((?!NOT|OR)\w+|"[^"]+")\b(?!\s+OR)
AnyWords: Well.. the rest. AnyWords:好吧..其余的。 =) They are not that easy to spot, since I do not know how to put "OR behind it or OR in front of it" into regex.
=)它们并不是那么容易发现,因为我不知道如何在正则表达式中加上“或”或“或”。 Maybe you could join the results from the three regexes
也许您可以加入三个正则表达式的结果
(?<=OR\s)\b((?!NOT|OR)\w+|"[^"]+")\b(?!\s+OR)
(?<=OR\s)\b((?!NOT|OR)\w+|"[^"]+")\b(?=\s+OR)
(?<!OR\s)\b((?!NOT|OR)\w+|"[^"]+")\b(?=\s+OR)
Problems: These require exactly one space between modifier words and expressions. 问题:这些要求修饰词和表达式之间恰好有一个空格。 PHP only supports lookbehinds for fixes length expressions, so I see no way around that, sorry.
PHP仅支持lookbehinds来修复长度表达式,所以抱歉,我看不到任何解决方法。 You could just use
\\b(\\w+|"[^"]+")\\b
to split the input, and parse the resulting array manually. 您可以只使用
\\b(\\w+|"[^"]+")\\b
拆分输入,然后手动解析结果数组。
This is an excellent example of how an test-first driven approach can help you arrive at a solution. 这是一个很好的例子,说明了测试优先驱动的方法如何帮助您找到解决方案。 It might not be the very best one, but having tests written allow you to refactor with confidence and instantly see if you break any of the existing tests.
它可能不是最好的,但是编写测试可以使您信心十足地进行重构,并立即查看是否破坏了任何现有测试。 Anyway, you could set up a few tests like:
无论如何,您可以设置一些测试,例如:
public function setUp () {
$this->searchParser = new App_Search_Parser();
}
public function testSingleWordParsesToAllWords () {
$this->searchParser->parse('Transport');
$this->assertEquals(
$this->searchParser->getAllWords(),
array('Transport')
);
$this->assertEquals($this->searchParser->getNotWords(), array());
$this->assertEquals($this->searchParser->getAnyWords());
}
public function testParseOfCombinedSearchString () {
$query = 'energy food "olympics 2010" Terrorism ' .
'OR "government" OR cups NOT transport';
$this->searchParser->parse($query);
$this->assertEquals(
$this->searchParser->getAllWords(),
array('energy', 'food', 'olympics 2010')
);
$this->assertEquals(
$this->searchParser->getNotWords(),
array('Transport')
);
$this->assertEquals(
$this->searchParser->getAnyWords(),
array( 'terrorism', 'government', 'cups')
);
}
Other good tests would include: 其他好的测试包括:
testParseTwoWords
testParseTwoWordsWithOr
testParseSimpleWithNot
testParseInvalid
testParseEmpty
Then, write the tests one by one, and write a simple solution that passes the test. 然后,一个接一个地编写测试,并编写一个通过测试的简单解决方案。 Then refactor and make it right, and run again to see that you still pass the test.
然后重构并使其正确,然后再次运行以查看您仍然通过了测试。 Once a test passes and the code is refactored, then write the next test and repeat the procedure.
测试通过并重构代码后,请编写下一个测试并重复该过程。 Add more tests as you find special cases and refactor the code so that it passes all tests.
发现特殊情况后添加更多测试,并重构代码,使其通过所有测试。 If you break a test, back-up and re-write the code (not the test!) such that it passes.
如果您破坏测试,请备份并重新编写代码(而不是测试!),使其通过。
As for how you can solve this problem, look into preg_match , strtok or rely simply loop through the string adding up tokens as you go. 至于如何解决此问题,请查看preg_match , strtok或依靠循环遍历字符串添加标记。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.