简体   繁体   English

如何在Groovy或Java中按元素“paths”过滤XML字符串

[英]How to filter XML string by element “paths” in Groovy or Java

I have an object that's currently mapped from a Java POJO to XML using JAXB. 我有一个对象,当前使用JAXB从Java POJO映射到XML。 Once I have that XML, I occasionally need to whittle it down to only a select set of elements based on input by a user. 一旦我拥有了这个XML,我偶尔需要根据用户的输入将其简化为一组精选元素。 The result should be XML with ONLY the specified "fields". 结果应该是只包含指定“字段”的XML。

I've come across a number of similar use cases which us SAX Filters, but they seem very complicated and the answers don't quite get me where I need. 我遇到过许多类似于SAX Filters的用例,但它们看起来非常复杂,答案并不能让我满足我的需要。 The closest example is this one , which excludes a single path from the result. 最接近的例子是这一个 ,它排除了结果中的单个路径。 I want the opposite -- whitelist a select list of elements. 我想要相反的 - 白名单选择元素列表。

Example object: School.xml 示例对象:School.xml

<SchoolInfo RefId="34060F68BE3942F1B1264E6D2CC3C353">
        <LocalId>57</LocalId>
        <SchoolName>Foobar School of Technology</SchoolName>
        <Principal>
           <FirstName>Bob</FirstName>
           <LastName>Smith</LastName>
        </Principal>
        <StateProvinceId>34573</StateProvinceId>
        <LEAInfoRefId>340666687E3942F1B1264E1223453C353</LEAInfoRefId>
        <PhoneNumberList>
           <PhoneNumber Type="0096">
              <Number>555-832-5555</Number>
           </PhoneNumber>
           <PhoneNumber Type="0096">
              <Number>555-999-5555</Number>
           </PhoneNumber>
        </PhoneNumberList>
     </SchoolInfo>

Given the following input as a "filter": 给出以下输入作为“过滤器”:

List<String> filter = [ 
    "LocalId",
    "SchoolName",
    "Principal/FirstName",
    "PhoneNumberList/PhoneNumber/Number",
 ]

I need the output to be: 我需要输出为:

<SchoolInfo RefId="34060F68BE3942F1B1264E6D2CC3C353">
    <LocalId>57</LocalId>
    <SchoolName>Foobar School of Technology</SchoolName>
    <Principal>
       <FirstName>Bob</FirstName>
    </Principal>
    <PhoneNumberList>
        <PhoneNumber Type="0096">
            <Number>555-832-5555</Number>
        </PhoneNumber>
        <PhoneNumber Type="0096">
            <Number>555-999-5555</Number>
        </PhoneNumber>
    </PhoneNumberList>
</SchoolInfo>

What is the best library to achieve this? 实现这一目标的最佳图书馆是什么? SAX Filtering feels to complicated, and XSLT doesn't seem like a good fit given the dynamic filtering. SAX过滤感觉很复杂,而且XSLT似乎不适合动态过滤。

Examples to help me get closer would be highly appreciated. 帮助我走近的例子将受到高度赞赏。

This is the code that does the white listing... it is based on XPath and VTD-XML. 这是执行白名单的代码......它基于XPath和VTD-XML。 Its output has indentation issues... this is the first pass that emphasizes correctness... 它的输出有缩进问题......这是第一个强调正确性的过程......

import com.ximpleware.*;
import java.io.*;
import java.util.*;

public class whiteList {

    public static void main(String[] s) throws VTDException, IOException{
        VTDGen vg = new VTDGen();
        List <String> filter = Arrays.asList("LocalId",
                "SchoolName",
                "Principal/FirstName",
                "PhoneNumberList/PhoneNumber/Number");
        if (!vg.parseFile("d:\\xml\\schoolInfo.xml", false)){
            return;
        }
        VTDNav vn = vg.getNav();
        FastIntBuffer fib = new FastIntBuffer();
        // build a bitmap for the entire token pool consisting of elements
        int i,k;
        for (i=0;i<vn.getTokenCount();i++){
            if (vn.getTokenType(i)==VTDNav.TOKEN_STARTING_TAG){
                fib.append(0x1);// b'11 since it is a white list,
            }else{
                fib.append(0);
            }
        }
        AutoPilot ap = new AutoPilot(vn);
        AutoPilot ap1= new AutoPilot(vn);
        ap1.selectXPath("descendant::*");// mark descendant as keep
        for (int j=0;j<filter.size();j++){
            ap.selectXPath(filter.get(j));
            while((i=ap.evalXPath())!=-1){
                fib.modifyEntry(i, 0x3);
                vn.push();
                do{
                    if( vn.getTokenDepth(vn.getCurrentIndex())>=0)
                       fib.modifyEntry(vn.getCurrentIndex(), 0x3);
                    else
                        break;
                }while(vn.toElement(VTDNav.P));
                vn.pop();
                vn.push();
                while((k=ap1.evalXPath())!=-1){
                    fib.modifyEntry(k, 0x3);
                }
                ap1.resetXPath();
                vn.pop();
            }
            ap.resetXPath();
        }

        //remove those not on the whitelist
        XMLModifier xm = new XMLModifier(vn);
        for (int j=0;j<fib.size();j++){
            if (fib.intAt(j)==0x1){
                vn.recoverNode(j);
                xm.remove();
            }
        }
        xm.output("d:\\xml\\newSchoolInfo.xml");                    
    }
}

All Groovy: 所有Groovy:

import groovy.xml.XmlUtil

def xml = '''<SchoolInfo RefId="34060F68BE3942F1B1264E6D2CC3C353">
    <LocalId>57</LocalId>
    <SchoolName>Foobar School of Technology</SchoolName>
    <Principal>
       <FirstName>Bob</FirstName>
       <LastName>Smith</LastName>
    </Principal>
    <StateProvinceId>34573</StateProvinceId>
    <LEAInfoRefId>340666687E3942F1B1264E1223453C353</LEAInfoRefId>
    <PhoneNumberList>
       <PhoneNumber Type="0096">
          <Number>555-832-5555</Number>
       </PhoneNumber>
       <PhoneNumber Type="0096">
          <Number>555-999-5555</Number>
       </PhoneNumber>
    </PhoneNumberList>
 </SchoolInfo>'''

def node = new XmlParser().parseText(xml)

def whitelist = [ 'LocalId', 'SchoolName', 'Principal/FirstName', "PhoneNumberList/PhoneNumber/Number" ]*.split('/')

def void loveRemovalMachine(node, whitelist) {
    def elementNamesToKeep = whitelist*.head()
    println "Retaining nodes ${elementNamesToKeep} for node $node"
    def nodesToRemove = node.'*'.findAll { child -> !elementNamesToKeep.contains(child.name()) }
    nodesToRemove.each { node.remove it }
    def nextWhitelist = whitelist*.tail().findAll { it }
    println "Next level: $nextWhitelist"
    if (!nextWhitelist) {
        return
    }
    // The "*" operator seems to return text nodes...very stupid.
    node.'*:*'.each { loveRemovalMachine it, nextWhitelist }
}

loveRemovalMachine node, whitelist

XmlUtil.serialize node

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM