I have search strings, similar to the one bellow:
energy food "olympics 2010" Terrorism OR "government" OR cups NOT transport
and I need to parse it with PHP5 to detect if the content belongs to any of the following clusters:
These are the rules i have set:
So the end result should be something similar to:
AllWords: (energy, food, "olympics 2010")
AnyWords: (terrorism, "government", cups)
NotWords: (Transport)
What would be a good way to do this?
If you want to do this with Regex, be aware that your parsing will break on stupid user input (the user, not the input =) ).
I'd try the following Regexes.
NotWords:
(?<=NOT\s)\b((?!NOT|OR)\w+|"[^"]+")\b
AllWords:
(?<!OR\s)\b((?!NOT|OR)\w+|"[^"]+")\b(?!\s+OR)
AnyWords: Well.. the rest. =) They are not that easy to spot, since I do not know how to put "OR behind it or OR in front of it" into regex. Maybe you could join the results from the three regexes
(?<=OR\s)\b((?!NOT|OR)\w+|"[^"]+")\b(?!\s+OR)
(?<=OR\s)\b((?!NOT|OR)\w+|"[^"]+")\b(?=\s+OR)
(?<!OR\s)\b((?!NOT|OR)\w+|"[^"]+")\b(?=\s+OR)
Problems: These require exactly one space between modifier words and expressions. PHP only supports lookbehinds for fixes length expressions, so I see no way around that, sorry. You could just use \\b(\\w+|"[^"]+")\\b
to split the input, and parse the resulting array manually.
This is an excellent example of how an test-first driven approach can help you arrive at a solution. It might not be the very best one, but having tests written allow you to refactor with confidence and instantly see if you break any of the existing tests. Anyway, you could set up a few tests like:
public function setUp () {
$this->searchParser = new App_Search_Parser();
}
public function testSingleWordParsesToAllWords () {
$this->searchParser->parse('Transport');
$this->assertEquals(
$this->searchParser->getAllWords(),
array('Transport')
);
$this->assertEquals($this->searchParser->getNotWords(), array());
$this->assertEquals($this->searchParser->getAnyWords());
}
public function testParseOfCombinedSearchString () {
$query = 'energy food "olympics 2010" Terrorism ' .
'OR "government" OR cups NOT transport';
$this->searchParser->parse($query);
$this->assertEquals(
$this->searchParser->getAllWords(),
array('energy', 'food', 'olympics 2010')
);
$this->assertEquals(
$this->searchParser->getNotWords(),
array('Transport')
);
$this->assertEquals(
$this->searchParser->getAnyWords(),
array( 'terrorism', 'government', 'cups')
);
}
Other good tests would include:
testParseTwoWords
testParseTwoWordsWithOr
testParseSimpleWithNot
testParseInvalid
testParseEmpty
Then, write the tests one by one, and write a simple solution that passes the test. Then refactor and make it right, and run again to see that you still pass the test. Once a test passes and the code is refactored, then write the next test and repeat the procedure. Add more tests as you find special cases and refactor the code so that it passes all tests. If you break a test, back-up and re-write the code (not the test!) such that it passes.
As for how you can solve this problem, look into preg_match , strtok or rely simply loop through the string adding up tokens as you go.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.