简体   繁体   中英

best way to read data from flat file and write it to xml

I have a flat .txt file with comma seprated values in a row, something like :

1,name1,department1
2,name2,department2
3,name3,department3
...
...

Now I want to read these records from .txt file and write it to xml, and the output should be something like :

<Employees>
     <Employee>
          <Code>1</Code>
          <Name>name1</Name>
          <Department>department1</Department>
     </Employee>
     <Employee>
          <Code>2</Code>
          <Name>name2</Name>
          <Department>department2</Department>
     </Employee>
     <Employee>
          <Code>3</Code>
          <Name>name3</Name>
          <Department>department3</Department>
     </Employee>
</Employees>

So now to achieve this I have gone through with various questions/posts, somehow I am confused with the approach that I should follow and which XMLBuilder I should use, like XStream ?

Can anybody tell me that which approach I should follow in order to achieve the best performance ?

I would use a CSV library such as openCSV to read the file, then use JAXB to create the XML file.

You can create an Employees class with a List<Employee> where Employee has fields Code , Name etc. Fill it in using the CSV library. Use one of the JAXB.marshal methods to write the whole thing out to a file in one line.

Simple example code

@XmlRootElement
@XmlAccessorType(XmlAccessType.PUBLIC_MEMBER)
public class XmlWriterTest
{
    public String foo;
    public List<String> bars;

    public static void main(String[] args)
    {
        XmlWriterTest test = new XmlWriterTest();
        test.foo = "hi";
        test.bars = Arrays.asList("yo", "oi");
        JAXB.marshal(test, System.out);
    }   
}

Here's the simplest way in psuedocode:

file.write("<Employees>");
foreach(String line : file)
{
    String[] parts = line.split(",");
    file.write("<Employee><Code>" + parts[0] + "</Code><Name>" + parts[1] + "</Name><Department>" + parts[2] + "</Department></Employee>");
}
file.write("</Employees>");

Obviously this solution is very naive and assumes your flat file doesn't contain commas in fields and every line has exactly 3 columns.

From your comments, the easiest way seems to be to just do this without any xml builder using print/write:

  1. read txt file line by line
  2. split fields using "," as the separator
  3. write in xml markup to file/stdout using simple System.out.print method

Don't forget the XML header.

If your format changes frequently, you'd write a .xsd schema and use jaxb to generate a class hierarchy and marshalling/unmarshalling code, but in this case it would be overkill.

How about a one line awk solution?

awk -F, 'BEGIN{printf "<Employees>\n"}END{printf "</Employees>\n"}{printf"<Employee><Code>%s</Code><Name>%s</Name><Department>%s</Department></Employee>\n",$1,$2,$3}' data.txt 

Writing a Java program would seem to be overkill for such a simple problem.

Update

If you want output formated you can pipe it into the xmllint command:

$ awk -F, 'BEGIN{printf "<Employees>"}END{printf "</Employees>"}{printf"<Employee><Code>%s</Code><Name>%s</Name><Department>%s</Department></Employee>",$1,$2,$3}' data.txt | xmllint --format -
<?xml version="1.0"?>
<Employees>
  <Employee>
    <Code>1</Code>
    <Name>name1</Name>
    <Department>department1</Department>
  </Employee>
  <Employee>
    <Code>2</Code>
    <Name>name2</Name>
    <Department>department2</Department>
  </Employee>
  <Employee>
    <Code>3</Code>
    <Name>name3</Name>
    <Department>department3</Department>
  </Employee>
</Employees>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM