简体   繁体   中英

whitch Regex can i use to split XML string before and after mathml match

I would like to ask which Regex i can use in order to splits the text string by <math xmlns='http://www.w3.org/1998/Math/MathML'>....</math>

the the result will be:

在此处输入图片说明

the code is:

        var text = @"{(test&<math xmlns='http://www.w3.org/1998/Math/MathML'><apply><plus></plus><cn>1</cn><cn>2</cn></apply></math>)|(<math xmlns='http://www.w3.org/1998/Math/MathML'><apply><root></root><degree><ci>m</ci></degree><ci>m</ci></apply></math>&nnm)&<math xmlns='http://www.w3.org/1998/Math/MathML'><apply><power></power><cn>1</cn><cn>2</cn></apply></math>#<math xmlns='http://www.w3.org/1998/Math/MathML'><set><ci>l</ci></set></math>}";
        string findTagString = "(<math.*?>)|(.+?(?=<math/>))";
        Regex findTag = new Regex(findTagString);
        List<string> textList = findTag.Split(text).ToList();

I have found a similar question at Using Regex to split XML string before and after match and i would like to ask for advice about the Regex expression

Thank you

Ori

经过一些测试,我认为这可以完成工作:

string findTagString = "(<math.*?></math>)|((.*){}()#&(.*))</math>";

Here is my attempt, based on a zero-length look-ahead and look-behind:

(?=<math[^>]*>)|(?<=</math>)

Code:

string findTagString = "(?=<math[^>]*>)|(?<=</math>)";
var text = @"{(test&<math xmlns='http://www.w3.org/1998/Math/MathML'><apply><plus></plus><cn>1</cn><cn>2</cn></apply></math>)|(<math xmlns='http://www.w3.org/1998/Math/MathML'><apply><root></root><degree><ci>m</ci></degree><ci>m</ci></apply></math>&nnm)&<math xmlns='http://www.w3.org/1998/Math/MathML'><apply><power></power><cn>1</cn><cn>2</cn></apply></math>#<math xmlns='http://www.w3.org/1998/Math/MathML'><set><ci>l</ci></set></math>}";
Regex findTag = new Regex(findTagString);
string[] textList = findTag.Split(text);
Console.WriteLine(string.Join("\n", textList));

Output of a sample program :

{(test&                                                                                                                                                             
<math xmlns='http://www.w3.org/1998/Math/MathML'><apply><plus></plus><cn>1</cn><cn>2</cn></apply></math>                                                            
)|(                                                                                                                                                                 
<math xmlns='http://www.w3.org/1998/Math/MathML'><apply><root></root><degree><ci>m</ci></degree><ci>m</ci></apply></math>                                           
&nnm)&                                                                                                                                                              
<math xmlns='http://www.w3.org/1998/Math/MathML'><apply><power></power><cn>1</cn><cn>2</cn></apply></math>                                                          
#                                                                                                                                                                   
<math xmlns='http://www.w3.org/1998/Math/MathML'><set><ci>l</ci></set></math>                                                                                       
}     

I would advise against trying to use regular expressions with XML. XML is not a regular language and thus not fitting for regular expressions. Anyway .NET gives such convenient tools for parsing XML that I really don't see the point.

My suggestion is that you use LINQ to XML instead of regexs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM