[英]Need java Regex to remove/replace the XML elements from specific string
I have a problem in getting the correct Regular expression.I have below xml as string我在获取正确的正则表达式时遇到问题。我有以下 xml 作为字符串
<user_input>
<UserInput Question="test Q?" Answer=<value>0</value><sam@testmail.com>"
</user_input>
Now I need to remove the xml character from Answer attribute only.现在我只需要从 Answer 属性中删除 xml 字符。 So I need the below:-
所以我需要以下内容:-
<user_input>
<UserInput Question="test Q?" Answer=value0value sam@testmail.com"
</user_input>
I have tried the below regex but did not worked out:-我试过下面的正则表达式但没有成功:-
str1.replaceAll("Answer=.*?<([^<]*)>", "$1");
its removing all the text before..它删除了之前的所有文本..
Can anyone help please?有人可以帮忙吗?
httpRequest.send("msg="+data+"&TC="+TC);
try like this 试试这样
Although variable width look-behinds are not supported in Java, you can work around it with .{0,1000}
that should suffice. 尽管Java中不支持可变宽度的后视,但您可以使用
.{0,1000}
来解决它。
Please check out this approach using 2 regexes, or 1 regex and 1 replace
. 请使用2个正则表达式检查此方法,或1个正则表达式和1个
replace
。 Choose the one that suits best (I removed the \\n
line break from the first input string to show the flaw with using simple replace
): 选择最适合的那个(我从第一个输入字符串中删除
\\n
换行符以显示使用简单replace
的缺陷):
String input = "<user_input><UserInput Question=\"test Q?\" Answer=<value>0</value><sam@testmail.com>\"\n</user_input>";
String st = input.replace("><", " ").replaceAll("(?<=Answer=.{0,1000})[<>/]+(?=[^\"]*\")", "");
String st1 = input.replaceAll("(?<=Answer=.{0,1000})><(?=[^\"]*\")", " ").replaceAll("(?<=Answer=.{0,1000})[<>/]+(?=[^\"]*\")", "");
System.out.println(st + "\n" + st1);
Output of a sample program : 示例程序的输出:
<user_input UserInput Question="test Q?" Answer=value0value sam@testmail.com"
</user_input>
<user_input><UserInput Question="test Q?" Answer=value0value sam@testmail.com"
</user_input>
First off, in your sample above, there is a trailing "
after the email and >
which I do not know if it was placed by error.首先,在上面的示例中,在 email 和
>
之后有一个尾随"
,我不知道它是否被错误放置。
However, I will keep it there as according to your expected result, you need it to still be present.但是,我会根据您的预期结果将其保留在那里,您需要它仍然存在。
This is my hack.这是我的技巧。
(Answer=)(<)(value)(>)(.+?([^<]*))(</)(value)(><)(.+?([^>]*))(>)
to replace it with (Answer=)(<)(value)(>)(.+?([^<]*))(</)(value)(><)(.+?([^>]*))(>)
将其替换为
$1$3$5$8 $10
The explanation...说明...
(Answer=)(<)(value)(>)
matches from Answer to the start of the value 0 (Answer=)(<)(value)(>)
匹配从 Answer 到值 0 的开头
(.+?([^<]*)
matches the result from 0 or more right to the beginning <
which starts the closing value tag (.+?([^<]*)
匹配从 0 或更多开始的结果<
开始结束值标记
(</)
here, I still select this since it was dropped in the previous expression (</)
在这里,我仍然是 select 这个,因为它在前面的表达式中被删除了
(><)
I will later replace this with a space (><)
我稍后会用空格替换它
(.+?([^>]*)
This matches from the start of the email and excludes the >
after the.com (.+?([^>]*)
这从 email 的开头开始匹配,并排除 .com 之后的>
(>)
this one selects the last >
which I will later drop when replacing. (>)
这一个选择最后一个>
我稍后将在替换时删除它。
The trailing "
is not selected as I will rather not touch it as requested.尾随的
"
未被选中,因为我不想按要求触摸它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.