需要 java 正则表达式从特定字符串中删除/替换 XML 元素

Question

I have a problem in getting the correct Regular expression.I have below xml as string我在获取正确的正则表达式时遇到问题。我有以下 xml 作为字符串

<user_input>
<UserInput Question="test Q?" Answer=<value>0</value><sam@testmail.com>"
</user_input>

Now I need to remove the xml character from Answer attribute only.现在我只需要从 Answer 属性中删除 xml 字符。 So I need the below:-所以我需要以下内容：-

<user_input>
<UserInput Question="test Q?" Answer=value0value sam@testmail.com"
</user_input>

I have tried the below regex but did not worked out:-我试过下面的正则表达式但没有成功：-

str1.replaceAll("Answer=.*?<([^<]*)>", "$1");

its removing all the text before..它删除了之前的所有文本..

Can anyone help please?有人可以帮忙吗？

Answer 1

You need to put ? 你需要放? within the first group to make it none greedy, also you dont need Answer=.*? 在第一组内，没有贪心，你也不需要Answer=.*? : ：

str1.replaceAll("<([^<]*?)>", "$1")

DEMO DEMO

Answer 2

httpRequest.send("msg="+data+"&TC="+TC); try like this 试试这样

Answer 3

Although variable width look-behinds are not supported in Java, you can work around it with .{0,1000} that should suffice. 尽管Java中不支持可变宽度的后视，但您可以使用.{0,1000}来解决它。

Please check out this approach using 2 regexes, or 1 regex and 1 replace . 请使用2个正则表达式检查此方法，或1个正则表达式和1个replace 。 Choose the one that suits best (I removed the \\n line break from the first input string to show the flaw with using simple replace ): 选择最适合的那个（我从第一个输入字符串中删除\\n换行符以显示使用简单replace的缺陷）：

String input = "<user_input><UserInput Question=\"test Q?\" Answer=<value>0</value><sam@testmail.com>\"\n</user_input>";
String st = input.replace("><", " ").replaceAll("(?<=Answer=.{0,1000})[<>/]+(?=[^\"]*\")", "");
String st1 = input.replaceAll("(?<=Answer=.{0,1000})><(?=[^\"]*\")", " ").replaceAll("(?<=Answer=.{0,1000})[<>/]+(?=[^\"]*\")", "");
System.out.println(st + "\n" + st1);

Output of a sample program : 示例程序的输出：

<user_input UserInput Question="test Q?" Answer=value0value sam@testmail.com"                                                                                                                                                                          
</user_input>  

<user_input><UserInput Question="test Q?" Answer=value0value sam@testmail.com"                                                                                                                                                                         
</user_input>

Answer 4

First off, in your sample above, there is a trailing " after the email and > which I do not know if it was placed by error.首先，在上面的示例中，在 email 和>之后有一个尾随" ，我不知道它是否被错误放置。

However, I will keep it there as according to your expected result, you need it to still be present.但是，我会根据您的预期结果将其保留在那里，您需要它仍然存在。

This is my hack.这是我的技巧。

(Answer=)(<)(value)(>)(.+?([^<]*))(</)(value)(><)(.+?([^>]*))(>) to replace it with (Answer=)(<)(value)(>)(.+?([^<]*))(</)(value)(><)(.+?([^>]*))(>)将其替换为

$1$3$5$8 $10

The explanation...说明...

(Answer=)(<)(value)(>) matches from Answer to the start of the value 0 (Answer=)(<)(value)(>)匹配从 Answer 到值 0 的开头

(.+?([^<]*) matches the result from 0 or more right to the beginning < which starts the closing value tag (.+?([^<]*)匹配从 0 或更多开始的结果<开始结束值标记

(</) here, I still select this since it was dropped in the previous expression (</)在这里，我仍然是 select 这个，因为它在前面的表达式中被删除了

(><) I will later replace this with a space (><)我稍后会用空格替换它

(.+?([^>]*) This matches from the start of the email and excludes the > after the.com (.+?([^>]*)这从 email 的开头开始匹配，并排除 .com 之后的>

(>) this one selects the last > which I will later drop when replacing. (>)这一个选择最后一个>我稍后将在替换时删除它。

The trailing " is not selected as I will rather not touch it as requested.尾随的"未被选中，因为我不想按要求触摸它。

需要 java 正则表达式从特定字符串中删除/替换 XML 元素

问题描述

4 个解决方案

解决方案1
0 2015-04-22 07:18:44

解决方案2
0 2015-04-22 07:20:14

解决方案3
0 2015-04-22 08:21:51

解决方案4
0 2022-11-30 13:58:05

需要 java 正则表达式从特定字符串中删除/替换 XML 元素

问题描述

4 个解决方案

解决方案1 0 2015-04-22 07:18:44

解决方案2 0 2015-04-22 07:20:14

解决方案3 0 2015-04-22 08:21:51

解决方案4 0 2022-11-30 13:58:05

解决方案1
0 2015-04-22 07:18:44

解决方案2
0 2015-04-22 07:20:14

解决方案3
0 2015-04-22 08:21:51

解决方案4
0 2022-11-30 13:58:05