简体   繁体   English

使用hadoop将XML转换为CSV

[英]xml to csv using hadoop

guys i am trying in convert my xml file to csv using hadoop so i am using the following code in mapper class 伙计们,我正在尝试使用hadoop将xml文件转换为csv,所以我在mapper类中使用以下代码

   protected void map(LongWritable key, Text value,
                 @SuppressWarnings("rawtypes") Mapper.Context context)
  throws
  IOException, InterruptedException {
String document = value.toString();
System.out.println("‘" + document + "‘");
    try {
  XMLStreamReader reader =
      XMLInputFactory.newInstance().createXMLStreamReader(new
          ByteArrayInputStream(document.getBytes()));
  String propertyName = "";
  String propertyValue = "";
  String currentElement = "";
  while (reader.hasNext()) {
    int code = reader.next();
    switch (code) {
      case XMLStreamConstants.START_ELEMENT: //START_ELEMENT:
        currentElement = reader.getLocalName();
        break;
      case XMLStreamConstants.CHARACTERS:  //CHARACTERS:
        if (currentElement.equalsIgnoreCase("author")) {
          propertyName += reader.getText();
         } else if (currentElement.equalsIgnoreCase("price"))
        {
            String name=reader.getText();
            name.trim();
          propertyName += name;
          propertyName.trim();
         }
 }
        console.write(null,new Text(propertyName));
 }
 }

but the output i am getting is in this form 但是我得到的输出是这种形式

Gambardella, Matthew
      XML Developer's Guide
      44.95
      2000-10-01


Ralls, Kim
      Midnight Rain
      5.95
      2000-12-16

can u help me with this 你能帮我吗

The output of the program depends on how you are collecting/writing from mapper. 程序的输出取决于您如何从映射器收集/写入。

In this case you should be using TextOutputFormat & KeyOut will be NullWritable and ValueOut will be Text. 在这种情况下,您应该使用TextOutputFormat,而KeyOut将为NullWritable,而ValueOut将为Text。 The Value out should be a concatenation of the values which you extracted from CSV. “值输出”应是从CSV提取的值的串联。

From your code it looks like you are writing output after reading each value from the XML. 从您的代码看来,您正在从XML读取每个值之后正在编写输出。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM