简体   繁体   English

Java RegExp:找到正确的正则表达式

[英]Java RegExp: Finding the correct regular expression

I am struggling with finding the correct regular expression for extracting the strings according to the following criteria:我正在努力寻找根据以下标准提取字符串的正确正则表达式:

I have an xml fragment with multiple tags.我有一个带有多个标签的 xml 片段。 Each element starts with <ABC_xxxx> and ends with </ABC_xxxx>每个元素以<ABC_xxxx>开头,以</ABC_xxxx>结尾

The xxxx changes for each element.每个元素的 xxxx 都会发生变化。 For example:例如:

 <ABC_A1S1>1234</ABC_A1S1>
 <ABC_uw3ey>1234</ABC_uw3ey>
 <ABC_PD4frfr5>1234</ABC_PD4frfr5>

etc...等等...

The number of x is not fixed! x的数量不固定!

I want to extract each element, including the tags themselves.我想提取每个元素,包括标签本身。

How can I do that?我怎样才能做到这一点?

Thanks.谢谢。

Assuming that there will be no such elements nested inside each other, try this:假设没有这样的元素相互嵌套,试试这个:

\<ABC(\w+)\>[^\<]+\<\/ABC(\1)\>

Explanation:解释:

  • \\<ABC(\\w+)\\> is the opening tag that starts with ABC the letters after ABC are captured in a group (hence parentheses). \\<ABC(\\w+)\\>是开始标记,与开始ABC后的字母ABC的基团(因此括号)被捕获。 We need them later我们以后需要它们
  • [^\\<]+ is the body of the element which is any character except opening angle bracket [^\\<]+是元素的主体,它是除左尖括号外的任何字符
  • <\\/ABC(\\1)\\> is the closing element that starts with ABC and must follow with the exact letters after ABC in the opening tag. <\\/ABC(\\1)\\>是以ABC开头的结束元素,并且必须跟在开始标签中ABC之后的确切字母之后。 \\1 is a reference to the first captured group. \\1是对第一个捕获组的引用。

Important Note : XML is not a regular language , therefore Regular Expressions are not capable to parse it.重要说明:XML 不是常规语言,因此正则表达式无法解析它。 Eg, imagine 2 or more such elements nested inside each other.例如,想象两个或更多这样的元素相互嵌套。 Use an XML parser to parse XML.使用 XML 解析器来解析 XML。

尝试这个:

<ABC_([^>]*)>([^<]*)<\/ABC_([^>]*)>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM