[英]Parsing wikiText with regex in Java
Given a wikiText string such as:给定一个 wikiText 字符串,例如:
{{ValueDescription
|key=highway
|value=secondary
|image=Image:Meyenburg-L134.jpg
|description=A highway linking large towns.
|onNode=no
|onWay=yes
|onArea=no
|combination=
* {{Tag|name}}
* {{Tag|ref}}
|implies=
* {{Tag|motorcar||yes}}
}}
I'd like to parse templates ValueDescription
and Tag
in Java/Groovy.我想在 Java/Groovy 中解析模板ValueDescription
和Tag
。 I tried with with regex /\{\{\s*Tag(.+)\}\}/
and it's fine (it returns |name
|ref
and |motorcar||yes
), but /\{\{\s*ValueDescription(.+)\}\}/
doesn't work (it should return all the text above).我尝试使用正则表达式/\{\{\s*Tag(.+)\}\}/
很好(它返回|name
|ref
和|motorcar||yes
),但是/\{\{\s*ValueDescription(.+)\}\}/
不起作用(它应该返回上面的所有文本)。
The expected output预计output
Is there a way to skip nested templates in the regex?有没有办法跳过正则表达式中的嵌套模板?
Ideally I would rather use a simple wikiText 2 xml tool, but I couldn't find anything like that.理想情况下,我宁愿使用简单的wikiText 2 xml工具,但我找不到类似的东西。
Thanks!谢谢! Mulone木龙
Arbitrarily nested tags won't work since that's makes the grammar non-regular .任意嵌套的标签将不起作用,因为这会使语法变得不规则。 You need something capable of dealing with a context-free grammar.您需要能够处理上下文无关语法的东西。 ANTLR is a fine option. ANTLR是一个不错的选择。
Create your regex pattern using Pattern.DOTALL
option like this:使用Pattern.DOTALL
选项创建您的正则表达式模式,如下所示:
Pattern p = Pattern.compile("\\{\\{\\s*ValueDescription(.+)\\}\\}", Pattern.DOTALL);
Pattern p=Pattern.compile("\\{\\{\\s*ValueDescription(.+)\\}\\}",Pattern.DOTALL);
Matcher m=p.matcher(str);
while (m.find())
System.out.println("Matched: [" + m.group(1) + ']');
Matched: [
|key=highway
|value=secondary
|image=Image:Meyenburg-L134.jpg
|description=A highway linking large towns.
|onNode=no
|onWay=yes
|onArea=no
|combination=
* {{Tag|name}}
* {{Tag|ref}}
|implies=
* {{Tag|motorcar||yes}}
]
Assuming closing }}
appears on a separate line for {{ValueDescription
following pattern will work to capture multiple ValueDescription
:假设关闭}}
出现在{{ValueDescription
以下模式的单独行上将用于捕获多个ValueDescription
:
Pattern p = Pattern.compile("\\{\\{\\s*ValueDescription(.+?)\n\\}\\}", Pattern.DOTALL);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.