[英]regex to remove round brackets from a string
i have a string我有一个字符串
String s="[[Identity (philosophy)|unique identity]]";
i need to parse it to .我需要将其解析为 .
s1 = Identity_philosphy
s2= unique identity
I have tried following code我试过以下代码
Pattern p = Pattern.compile("(\\[\\[)(\\w*?\\s\\(\\w*?\\))(\\s*[|])\\w*(\\]\\])");
Matcher m = p.matcher(s);
while(m.find())
{
....
}
But the pattern is not matching..但是模式不匹配..
Please Help请帮助
Thanks谢谢
Use使用
String s="[[Identity (philosophy)|unique identity]]";
String[] results = s.replaceAll("^\\Q[[\\E|]]$", "") // Delete double brackets at start/end
.replaceAll("\\s+\\(([^()]*)\\)","_$1") // Replace spaces and parens with _
.split("\\Q|\\E"); // Split with pipe
System.out.println(results[0]);
System.out.println(results[1]);
Output:输出:
Identity_philosophy
unique identity
Have you tried using something like this to help you? 你尝试过这样的东西来帮助你吗? RegExr
RegExr
Using that website I was able to create a pattern which will match exactly what you want. 使用该网站,我能够创建一个与您想要的完全匹配的模式。
This will work on that website: (\\[\\[)|(\\()|(\\))|(\\|)|(\\]\\])
这将适用于该网站:
(\\[\\[)|(\\()|(\\))|(\\|)|(\\]\\])
Insert double backslashes for it to work within Java: (\\\\[\\\\[)|(\\\\()|(\\\\))|(\\\\|)|(\\\\]\\\\])
插入双反斜杠使其在Java中起作用:
(\\\\[\\\\[)|(\\\\()|(\\\\))|(\\\\|)|(\\\\]\\\\])
You may use您可以使用
String s="[[Identity (philosophy)|unique identity]]";
Matcher m = Pattern.compile("\\[{2}(.*)\\|(.*)]]").matcher(s);
if (m.matches()) {
System.out.println(m.group(1).replaceAll("\\W+", " ").trim().replace(" ", "_")); // // => Identity_philosphy
System.out.println(m.group(2).trim()); // => unique identity
}
Details详情
The "\\\\[{2}(.*)\\\\|(.*)]]"
with matches()
is parsed as a ^\\[{2}(.*)\\|(.*)]]\\z
pattern that matches a string that starts with [[
, then matches and captures any 0 or more chars other than line break chars as many as possible into Group 1, then matches a |
带有
matches()
的"\\\\[{2}(.*)\\\\|(.*)]]"
被解析为^\\[{2}(.*)\\|(.*)]]\\z
匹配以[[
开头的字符串的模式,然后匹配并尽可能多地将除换行符以外的任何 0 个或更多字符捕获到组 1 中,然后匹配一个|
, then matches and capture any 0 or more chars other than line break chars as many as possible into Group 2 and then matches ]]
. ,然后将除换行符以外的任何 0 个或更多字符尽可能多地匹配并捕获到组 2 中,然后匹配
]]
。 See the regex demo .请参阅正则表达式演示。
The contents in Group 2 can be trimmed from whitespace and used as is, but Group 1 should be preprocessed by replacing all 1+ non-word character chhunks with a space ( .replaceAll("\\\\W+", " ")
), then trimming the result ( .trim()
) and replacing all spaces with _
( .replace(" ", "_")
) as the final touch.组 2 中的内容可以从空格中删除并按原样使用,但组 1 应通过用空格替换所有 1+ 非单词字符块(
.replaceAll("\\\\W+", " ")
)进行预处理,然后修剪结果 ( .trim()
) 并用_
( .replace(" ", "_")
) 替换所有空格作为最后.trim()
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.