I seem to write one Reg expression a year and always end up asking for help.
Here's a string (it's a search string from Solr) and I want to select every instance of the search word.
Here's the input:-
http://server:8080/solr/app/select?q=(title_st_en%3Atheory+OR+title_st_ar%3Atheory+OR+title_st_da%3Atheory+OR+title_st_fr%3Atheory+OR+title_st_de%3Atheory+OR+title_st_it%3Atheory+OR+title_st_no%3Atheory+OR+title_st_sv%3Atheory+OR+title_st_ru%3Atheory+OR+title_st_es%3Atheory+OR+title_st_bg%3Atheory+OR+title_st_cs%3Atheory+OR+title_st_tr%3Atheory+OR+title_st_nl%3Atheory+OR+title_st_zh-cn%3Atheory+OR+title_st_zh-tw%3Atheory+OR+title_st_hr%3Atheory+OR+title_st_et%3Atheory+OR+title_st_he%3Atheory+OR+title_st_hu%3Atheory+OR+title_st_ja%3Atheory+OR+title_st_ko%3Atheory+OR+title_st_pl%3Atheory+OR+title_st_ro%3Atheory+OR+title_st_th%3Atheory+OR+title_st_vi%3Atheory+OR+content_stemming_en%3Atheory+OR+content_stemming_no%3Atheory+OR+(backfields%3Atheory))+AND+(((virtualPath%3A%22%5C%5CSERVER%5C%5CU_TEST%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_SYSTEM%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_!CONTACTS%22)+AND+-(virtualPath%3A%22%5C%5CSERVER%5C%5CU_TEST%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_SYSTEM%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSF%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSFMAG%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSFRA%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NM%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_INTERNAL%5C%5CL%22+OR+virtualPath%3A
I need to select any text between every ' %3A
' and ' +OR
' as well as the final ' %3Atheory
))' - in this case the word ' theory
' but it will be a different word every time - the only known thing is it'll be any alpha text between the ' %3A
' and the ' +OR
'. And it need to stop at the ' +AND+
'
I've got as far as /%3A(.*?)[+OR]/g
- it's a start I guess... It doesn't find ' %3Atheory))
' and it doesn't stop at ' +AND+
'
I'm struggling with 'find this' OR 'find that' as well as stopping at a string.
anyone offer some guidance?
If you're using c# it might be better to split in two operations using String.Split and the Regex.Matches like so:
string input = @"http://server:8080/solr/app/select?q=(title_st_en%3Atheory+OR+title_st_ar%3Atheory+OR+title_st_da%3Atheory+OR+title_st_fr%3Atheory+OR+title_st_de%3Atheory+OR+title_st_it%3Atheory+OR+title_st_no%3Atheory+OR+title_st_sv%3Atheory+OR+title_st_ru%3Atheory+OR+title_st_es%3Atheory+OR+title_st_bg%3Atheory+OR+title_st_cs%3Atheory+OR+title_st_tr%3Atheory+OR+title_st_nl%3Atheory+OR+title_st_zh-cn%3Atheory+OR+title_st_zh-tw%3Atheory+OR+title_st_hr%3Atheory+OR+title_st_et%3Atheory+OR+title_st_he%3Atheory+OR+title_st_hu%3Atheory+OR+title_st_ja%3Atheory+OR+title_st_ko%3Atheory+OR+title_st_pl%3Atheory+OR+title_st_ro%3Atheory+OR+title_st_th%3Atheory+OR+title_st_vi%3Atheory+OR+content_stemming_en%3Atheory+OR+content_stemming_no%3Atheory+OR+(backfields%3Atheory))+AND+(((virtualPath%3A%22%5C%5CSERVER%5C%5CU_TEST%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_SYSTEM%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_!CONTACTS%22)+AND+-(virtualPath%3A%22%5C%5CSERVER%5C%5CU_TEST%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_SYSTEM%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSF%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSFMAG%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NDSFRA%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_NM%5C%5CL%22+OR+virtualPath%3A%22%5C%5CSERVER%5C%5CP_INTERNAL%5C%5CL%22+OR+virtualPath%3A";
Regex regex = new Regex(@"%3A(.*?)(?:\+OR|\)\))");
var splitted = input.Split(new[] { "AND" }, StringSplitOptions.None);
var matches = regex.Matches(splitted.First());
foreach (Match m in matches)
{
// Or whatever you like to do with your matches
Console.WriteLine(m.Groups[1].Value);
}
Regex.Split
has an option to keep the separating strings. So for the text given in the question, code like that below will split it into pieces:
string[] pieces = Regex.Split(theInputText, "(%3A.*?\\+(?:AND|OR))");
foreach (string ss in pieces)
{
Console.WriteLine(ss);
}
Here is a small section of the output:
+virtualPath
%3A%22%5C%5CSERVER%5C%5CP_SYSTEM%22+OR
+virtualPath
%3A%22%5C%5CSERVER%5C%5CP_!CONTACTS%22)+AND
+-(virtualPath
%3A%22%5C%5CSERVER%5C%5CU_TEST%5C%5CL%22+OR
+virtualPath
Having split the string into pieces it should be a simple matter to screen for the array elements with the correct starting and ending characters, also to find the last %3Atheory...
entry.
Note: The question discusses +OR
and +AND+
but all the +OR
s are followed with a +
so it may be better to include a final +
in the expression, as ...OR)\\\\+)
.
Note: The inner brackets in the regular expression are non capturing, ie (?: )
. If they were capturing brackets then the AND
and OR
captures would be included in the output array.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.