简体   繁体   English

使用java中的regex从无效/部分xml中提取值

[英]extract value from an invalid/part xml using regex in java

I have an xml like this which is obtained direcly from databse(which need not be a valid one with proper opening and ending tags).I need to etraxct data from this xml eg. 我有一个像这样的xml,它是直接从数据库获得的(它不需要是一个有正确的开放和结束标签的有效的。)我需要从这个xml etraxct数据,例如。 color,level,prefix etc. Since proper xml format is not guaranteed the only way for me to do it is Regex..or is it?? 颜色,级别,前缀等。由于不能保证正确的xml格式,我这样做的唯一方法是正则表达式。或者是它?

The xml looks something like this xml看起来像这样

<indicator label_unit_en="Index points" label_unit_de="Basis punkte">  
<partition id="P_ABC_DEF.3">    
<part color="darkgreen"   level="50"    prefix_en="aaa 111"   prefix_de="unt ü 50">    
<part color="lightgreen"  level="100"   prefix_en="50 to 100"  prefix_de="qwe 100">    
<part color="lightorange" level="200"   prefix_en="100 to 200" prefix_de="100 qw 200">    
<part color="darkorange"  level="300"   prefix_en="200 to 300" prefix_de="20 w0">    
<part color="lightred"    level="500"   prefix_en="300 to 500" prefix_de="rr 0">    
part color="darkred"     level="99999" prefix_en="above 500"  prefix_de="ü 2">  
</partition>
</indicator>

Can any one suggest me a way to extract data from this xml??? 任何人都可以建议我从这个xml中提取数据吗?

I am able to extract data from 我可以从中提取数据
color(color=\\"(\\\\w+?)\\"), level(level=\\"(\\\\w+?)\\) , color(color=\\"(\\\\w+?)\\"), level(level=\\"(\\\\w+?)\\)
but not from the others. 但不是来自其他人。

Any of the matcher that I created is not finding anything for prefix_en,prefix_de, label_unit_en, label_unit_de 我创建的任何匹配器都找不到prefix_en,prefix_de, label_unit_en, label_unit_de

Please suggest a solution for this. 请为此建议一个解决方案。 Or is there any way other than regex to solve this problem. 或者除了正则表达式之外还有什么方法可以解决这个问题。

也许,初始XML可以转换为格式良好的库,如http://jtidy.sourceforge.net/ ,然后使用xPath或节点扫描提取数据。

The code you had pasted needs much formatting to be treated as xml : 您粘贴的代码需要大量格式才能被视为xml:

<?xml version="1.0" ?> 
<indicator label_unit_en="Index points" label_unit_de="Basis punkte">  
<partition id="P_ABC_DEF.3">    
<part color="darkgreen"   level="50"    prefix_en="aaa 111"   prefix_de="unt ü 50"/>    
<part color="lightgreen"  level="100"   prefix_en="50 to 100"  prefix_de="qwe 100"/>    
<part color="lightorange" level="200"   prefix_en="100 to 200" prefix_de="100 qw 200"/>    
<part color="darkorange"  level="300"   prefix_en="200 to 300" prefix_de="20 w0"/>    
<part color="lightred"    level="500"   prefix_en="300 to 500" prefix_de="rr 0"/>    
<part color="darkred"     level="99999" prefix_en="above 500"  prefix_de="ü 2"/>  
</partition>
</indicator>

If you format it to xml as sugested by pasha701, you can get the values, else you can do string operations on this as : 如果将其格式化为pasle701消化的xml,则可以获取值,否则可以对此进行字符串操作:

    String result = "<indicator label_unit_en=\"Index points\" label_unit_de=\"Basis punkte\">"+  
              "<partition id=\"P_ABC_DEF.3\">"+    
              "<part color=\"darkgreen\"   level=\"50\"    prefix_en=\"aaa 111\"   prefix_de=\"unt ü 50\">"+    
              "<part color=\"lightgreen\"  level=\"100\"   prefix_en=\"50 to 100\"  prefix_de=\"qwe 100\">"+    
              "<part color=\"lightorange\" level=\"200\"   prefix_en=\"100 to 200\" prefix_de=\"100 qw 200\">"+           
              "</partition>"+
              "</indicator>";

System.out.println(x.substring(x.indexOf("color=")+7,x.indexOf(" ", x.indexOf("color="))-1));

Tell us what would you like, and would help accordingly. 告诉我们你想要什么,并相应地提供帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM