简体   繁体   English

需要 java 正则表达式从特定字符串中删除/替换 XML 元素

[英]Need java Regex to remove/replace the XML elements from specific string

I have a problem in getting the correct Regular expression.I have below xml as string我在获取正确的正则表达式时遇到问题。我有以下 xml 作为字符串

<user_input>
<UserInput Question="test Q?" Answer=<value>0</value><sam@testmail.com>"
</user_input>

Now I need to remove the xml character from Answer attribute only.现在我只需要从 Answer 属性中删除 xml 字符。 So I need the below:-所以我需要以下内容:-

<user_input>
<UserInput Question="test Q?" Answer=value0value sam@testmail.com"
</user_input>

I have tried the below regex but did not worked out:-我试过下面的正则表达式但没有成功:-

str1.replaceAll("Answer=.*?<([^<]*)>", "$1");

its removing all the text before..它删除了之前的所有文本..

Can anyone help please?有人可以帮忙吗?

You need to put ? 你需要放? within the first group to make it none greedy, also you dont need Answer=.*? 在第一组内,没有贪心,你也不需要Answer=.*? :

str1.replaceAll("<([^<]*?)>", "$1")

DEMO DEMO

httpRequest.send("msg="+data+"&TC="+TC); try like this 试试这样

Although variable width look-behinds are not supported in Java, you can work around it with .{0,1000} that should suffice. 尽管Java中不支持可变宽度的后视,但您可以使用.{0,1000}来解决它。

Please check out this approach using 2 regexes, or 1 regex and 1 replace . 请使用2个正则表达式检查此方法,或1个正则表达式和1个replace Choose the one that suits best (I removed the \\n line break from the first input string to show the flaw with using simple replace ): 选择最适合的那个(我从第一个输入字符串中删除\\n换行符以显示使用简单replace的缺陷):

String input = "<user_input><UserInput Question=\"test Q?\" Answer=<value>0</value><sam@testmail.com>\"\n</user_input>";
String st = input.replace("><", " ").replaceAll("(?<=Answer=.{0,1000})[<>/]+(?=[^\"]*\")", "");
String st1 = input.replaceAll("(?<=Answer=.{0,1000})><(?=[^\"]*\")", " ").replaceAll("(?<=Answer=.{0,1000})[<>/]+(?=[^\"]*\")", "");
System.out.println(st + "\n" + st1);

Output of a sample program : 示例程序的输出:

<user_input UserInput Question="test Q?" Answer=value0value sam@testmail.com"                                                                                                                                                                          
</user_input>  

<user_input><UserInput Question="test Q?" Answer=value0value sam@testmail.com"                                                                                                                                                                         
</user_input>  

First off, in your sample above, there is a trailing " after the email and > which I do not know if it was placed by error.首先,在上面的示例中,在 email 和>之后有一个尾随" ,我不知道它是否被错误放置。

However, I will keep it there as according to your expected result, you need it to still be present.但是,我会根据您的预期结果将其保留在那里,您需要它仍然存在。

This is my hack.这是我的技巧。

(Answer=)(<)(value)(>)(.+?([^<]*))(</)(value)(><)(.+?([^>]*))(>) to replace it with (Answer=)(<)(value)(>)(.+?([^<]*))(</)(value)(><)(.+?([^>]*))(>)将其替换为

$1$3$5$8 $10

The explanation...说明...

(Answer=)(<)(value)(>) matches from Answer to the start of the value 0 (Answer=)(<)(value)(>)匹配从 Answer 到值 0 的开头

(.+?([^<]*) matches the result from 0 or more right to the beginning < which starts the closing value tag (.+?([^<]*)匹配从 0 或更多开始的结果<开始结束值标记

(</) here, I still select this since it was dropped in the previous expression (</)在这里,我仍然是 select 这个,因为它在前面的表达式中被删除了

(><) I will later replace this with a space (><)我稍后会用空格替换它

(.+?([^>]*) This matches from the start of the email and excludes the > after the.com (.+?([^>]*)这从 email 的开头开始匹配,并排除 .com 之后的>

(>) this one selects the last > which I will later drop when replacing. (>)这一个选择最后一个>我稍后将在替换时删除它。

The trailing " is not selected as I will rather not touch it as requested.尾随的"未被选中,因为我不想按要求触摸它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM