简体   繁体   中英

Conversion from Flat-file to XML in a generic way

I am writing a generic (not too generic actually; two assumptions: 1> Every element has to be mandatory 2> if any multiple segment is there, then it should occur exactly 'n' times) program which can generate an XML from CCB/flat-file. I am providing a string input, considering as the content of a flat-file as of now and an configuration xml which is nothing but a picture of XSD in xml format.

I am providing those inputs below:

<complex name="PARENT">
<complex name="CHILD">
    <complex name="GRANT-CHILD" count="2">
        <field name="A" length="7"/>
        <field name="B" length="11"/>
        <field name="C" length="7"/>
        <field name="D" length="7"/>
        <field name="E" length="1"/>
        <field name="F" length="20"/>
        <field name="G" length="10"/>
        <field name="H" length="10"/>
        <field name="I" length="7"/>
        <field name="J" length="7"/>
        <field name="K" length="7"/>
        <field name="L" length="7"/>
    </complex>
</complex>

`

The sample XML will look like this:

<PARENT>
<CHILD>
    <GRANT-CHILD>
        <A />
        <B />
        <C />
        <D />
        <E />
        <F />
        <G />
        <H />
        <I />
        <J />
        <K />
        <L />
    </GRANT-CHILD>
    <GRANT-CHILD>
        <A />
        <B />
        <C />
        <D />
        <E />
        <F />
        <G />
        <H />
        <I />
        <J />
        <K />
        <L />
    </GRANT-CHILD>
</CHILD>

My logic is, whenever it is complex type, I'm genereting the tag with the corresponding attribute ( name ), when it is a field I'm looking for the value of attribute length and getting those many characters from the input string and making a tag in the xml and also replacing those characters in the string with blanks. I have two classes, providing below:

package x.y.z;

import org.w3c.dom.Element;
import org.w3c.dom.Node;

public class ChildEle {

public static Element getFirstChildElement(Node parent)
{
    Node child = parent.getFirstChild();
    while (child != null)
    {
        if (child.getNodeType() == Node.ELEMENT_NODE)
            return (Element)child;
        child = child.getNextSibling();
    }
    return null;
}
public static Node getNextSiblingElement(Node present)
{
    Node node = present.getNextSibling();
    while (node != null && !(node instanceof Element))
        node = node.getNextSibling();
    return node;
}
}

and the second one is

package x.y.z;

import java.io.File;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;

public class FlatFileConversion {

public static void realmethod(Node node,String sto)
{
    if(node.getNodeName()=="complex")
    {   
        Element eElement = (Element)node;
        if(eElement.hasAttribute("count"))
        {
            String st=eElement.getAttribute("count");
            int x=Integer.parseInt(st);
            for(int i=0;i<x;i++)
            {
                System.out.println("<"+node.getAttributes().getNamedItem("name").getNodeValue()+">");
                realmethod((Node)ChildEle.getFirstChildElement(node),sto);
                System.out.println("</"+node.getAttributes().getNamedItem("name").getNodeValue()+">");
            }
        }
        else
        {
            System.out.println("<"+node.getAttributes().getNamedItem("name").getNodeValue()+">");
            realmethod((Node)ChildEle.getFirstChildElement(node),sto);
            System.out.println("</"+node.getAttributes().getNamedItem("name").getNodeValue()+">");
        }
    }
    if(node.getNodeName()=="field")
    {
        String str2=sto.substring(0, Math.min(sto.length(),Integer.parseInt(node.getAttributes().getNamedItem("length").getNodeValue())));
        System.out.print("<"+node.getAttributes().getNamedItem("name").getNodeValue()+">");
        System.out.print(str2.trim());
        System.out.println("</"+node.getAttributes().getNamedItem("name").getNodeValue()+">");
        sto=sto.replace(str2, "");
        try
        {
            realmethod(ChildEle.getNextSiblingElement(node),sto);
        }
        catch(Exception e)
        {

        }
    }
}
public static void main(String[] args) {

    String inp="74c83tjrl1nd7jmko3hg8octgitmicte3m0eq8mzmw7zae0sqgwrj4ylzueb9lzabc3hcu78lly3nwbi18ncw1mvu039ruvz5cju2vcyeq5upzsks9rn7jz75edrh2cbcxxh758ztvpkhyjb61al5eczc57bcizfoo1dhtdljd1gfzs69tqo9vqhiqt44gmbfdq7oddjfa";
    try
    {
         File inputFile = new File("E:\\test\\input.txt");
         DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
         DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
         Document doc = dBuilder.parse(inputFile);
         doc.getDocumentElement().normalize();
         realmethod((Node)doc.getDocumentElement(),inp);
    }
    catch(Exception e)
    {
        e.printStackTrace();
    }
}
}

The output is as follows:

<PARENT>
<CHILD>
    <GRANT-CHILD>
        <A>74c83tj</A>
        <B>rl1nd7jmko3</B>
        <C>hg8octg</C>
        <D>itmicte</D>
        <E>3</E>
        <F>m0eq8mzmw7zae0sqgwrj</F>
        <G>4ylzueb9lz</G>
        <H>abchcu78ll</H>
        <I>ynwbi18</I>
        <J>ncw1mvu</J>
        <K>09ruvz5</K>
        <L>cju2vcy</L>
    </GRANT-CHILD>
    <GRANT-CHILD>
        <A>74c83tj</A>
        <B>rl1nd7jmko3</B>
        <C>hg8octg</C>
        <D>itmicte</D>
        <E>3</E>
        <F>m0eq8mzmw7zae0sqgwrj</F>
        <G>4ylzueb9lz</G>
        <H>abchcu78ll</H>
        <I>ynwbi18</I>
        <J>ncw1mvu</J>
        <K>09ruvz5</K>
        <L>cju2vcy</L>
    </GRANT-CHILD>
</CHILD>

The GRANT-CHILD segments which has twice occurrences are getting generated exactly the same; for the second segment my code is unable to pick the characters from the input string and place those as text-node for the corresponding element-node.

Please help what is the wrong with the logic.

Not really answering your question, but it may be useful to know since you are solving the same problem...

There is a standard hosted on the Open Grid Forum called 'Data Format Definition Language' (DFDL). IBM has implemented DFDL in their integration software: https://en.wikipedia.org/wiki/Data_Format_Description_Language

and there is an independent open source implementation available: https://opensource.ncsa.illinois.edu/confluence/display/DFDL/Daffodil%3A+Open+Source+DFDL

DFDL can describe flat files, but can handle all kinds of delimited and tagged data as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM