简体   繁体   English

正则表达式使用多行和组

[英]Regex using Multiline and Groups

Hi guyes just had a quick question about using multi-line in regex: 大家好,我有一个关于在正则表达式中使用多行的快速问题:

The Regex: 正则表达式:

 string content = Regex.Match(onix.Substring(startIndex,endIndex - startIndex), @">(.+)<", RegexOptions.Multiline).Groups[1].Value;

Here is the string of text I am reading: 这是我正在阅读的文字字符串:

    <Title>
         <TitleType>01</TitleType>
         <TitleText textcase="02">18th Century Embroidery Techniques</TitleText>
    </Title>

Here is what I am getting: 这是我得到的:

01

What I want is everything between the 我想要的是

 <Title> and </Title>.

This works perfectly when everything is on one line but since starts on another line it seems to be skipping it or not including it into the pattern. 当所有内容都在一行上时,此方法非常有效,但是由于从另一行开始,因此似乎正在跳过它或未将其包括在模式中。

Any assistance is much appreciated. 非常感谢您的协助。

You must also use the Singleline option, along with Multiline: 您还必须使用“单行”选项和“多行”:

string content = Regex.Match(onix.Substring(startIndex,endIndex - startIndex), @">(.+)<", RegexOptions.Multiline | RegexOptions.Singleline).Groups[1].Value;

But do yourself a favor and stop parsing XML using Regular Expressions! 但是请帮个忙,不要再使用正则表达式来解析XML! Use an XML parser instead! 请改用XML解析器!

You can parse the XML text using the XmlDocument class, and use XPath selectors to get to the element you're interested in: 您可以使用XmlDocument类解析XML文本,并使用XPath选择器转到您感兴趣的元素:

XmlDocument doc = new XmlDocument();
doc.LoadXml(...);                              // your load the Xml text 

XmlNode root = doc.SelectSingleNode("Title");  // this selects the <Title>..</Title> element
                                               // modify the selector depending on your outer XML 
Console.WriteLine(root.InnerXml);              // displays the contents of the selected node

RegexOptions.Multiline will just change the meaning of ^ and $ to beginning/end of lines instead of beginning/end of the entire string. RegexOptions.Multiline只会将^$的含义更改为行的开头/结尾,而不是整个字符串的开头/结尾。

You want to use RegexOptions.Singleline instead, which will result in . 您想改用RegexOptions.Singleline ,结果为. match line breaks (as well as everything else). 匹配换行符(以及其他所有内容)。

You might want to parse what is probably XML instead. 您可能想解析可能是XML的东西。 If possible this is the preferred way of working instead of parsing it by employing regular expressions. 如果可能的话,这是首选的工作方式,而不是通过使用正则表达式进行解析。 Please disregard if not applicable. 如果不适用,请忽略。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM