简体   繁体   English

Java 用 SAX 解析读取 XML

[英]Java read XML with SAX Parsing

so I started to work with xml and the SAX parser and now I'm trying to figure out how its works, I am familiar with JSON but this doesn't seem to work like JSON.所以我开始使用 xml 和 SAX 解析器,现在我试图弄清楚它是如何工作的,我熟悉 JSON 但这似乎不像 Z0ECD11C1D7A287401D148A23F 那样工作So here is the code I'm work with所以这是我正在使用的代码

package com.myalbion.gamedataextractor.handlers;

import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.Map;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

import com.myalbion.gamedataextractor.Main;
import com.myalbion.gamedataextractor.datatables.Language;
import com.myalbion.gamedataextractor.datatables.Localized;
import com.myalbion.gamedataextractor.datatables.XMLFile;

public class LocalizationXMLFileHandler extends DefaultHandler {

    private String temp;
    Localized localized;
    List<Localized> localizedList;
    Map<Language, String> tempMap;

    /*
     * When the parser encounters plain text (not XML elements),
     * it calls(this method, which accumulates them in a string buffer
     */
    public void characters(char[] buffer, int start, int length) {
           temp = new String(buffer, start, length);
    }


    /*
     * Every time the parser encounters the beginning of a new element,
     * it calls this method, which resets the string buffer
     */ 
    public void startElement(String uri, String localName,
                  String qName, Attributes attributes) throws SAXException {
           temp = "";
           if (qName.equalsIgnoreCase("tu")) {
               localized = new Localized();
               localized.setUniqueName(attributes.getValue("tuid"));

           } else if(qName.equalsIgnoreCase("tuv")) {
               tempMap.put(Language.getLanguageFromCode(attributes.getValue("xml:lang")), )
           }
    }

    /*
     * When the parser encounters the end of an element, it calls this method
     */
    public void endElement(String uri, String localName, String qName)
                  throws SAXException {

           if (qName.equalsIgnoreCase("tu")) {
                  // add it to the list
                  accList.add(acct);

           } else if (qName.equalsIgnoreCase("Name")) {
                  acct.setName(temp);
           } else if (qName.equalsIgnoreCase("Id")) {
                  acct.setId(Integer.parseInt(temp));
           } else if (qName.equalsIgnoreCase("Amt")) {
                  acct.setAmt(Integer.parseInt(temp));
           }

    } 

}

and I am trying to extract the data from this xml File into the tempMap which holds the Language enum and localized Name.我试图从这个 xml 文件中提取数据到包含语言枚举和本地化名称的 tempMap 中。

<?xml version="1.0"?>
<tmx version="1.4">
  <body>
    <tu tuid="@ACCESS_RIGHTS_ACCESS_MODE">
      <tuv xml:lang="EN-US">
        <seg>Access Mode</seg>
      </tuv>
      <tuv xml:lang="DE-DE">
        <seg>Zugriffsmodus</seg>
      </tuv>
      <tuv xml:lang="FR-FR">
        <seg>Mode d'accès</seg>
      </tuv>
      <tuv xml:lang="RU-RU">
        <seg>Доступ</seg>
      </tuv>
      <tuv xml:lang="PL-PL">
        <seg>Tryb dostępu</seg>
      </tuv>
      <tuv xml:lang="ES-ES">
        <seg>Modo de acceso</seg>
      </tuv>
      <tuv xml:lang="PT-BR">
        <seg>Modo de acesso</seg>
      </tuv>
      <tuv xml:lang="ZH-CN">
        <seg>权限模式</seg>
      </tuv>
      <tuv xml:lang="KO-KR">
        <seg>접근 모드</seg>
      </tuv>
    </tu>
  </body>
</tmx>

Now at line 49 of the java code I am getting the language code from the tuv attribute but I'm missing the localized Name which is below the tuv called seg of can receive the parents attribute and get the seg value in the same line?现在在 java 代码的第 49 行,我从 tuv 属性中获取语言代码,但我缺少位于 tuv 下方的本地化名称,称为 seg of 可以接收父母属性并在同一行中获取 seg 值?

You're overwriting your text buffer every time you hit a new text node, including a whitespace-only text node like the ones between </seg> and </tuv> .每次点击新文本节点时,您都会覆盖文本缓冲区,包括像</seg></tuv>之间的纯空格文本节点。 You need to save the contents of the text buffer when processing the seg end tag, and pick it up when processing the tuv end tag.处理seg结束标记时需要保存文本缓冲区的内容,处理tuv结束标记时将其取出。

Also you should be aware that the content of a single text node can be supplied in a sequence of calls to text(): the parser can break it up any way it likes (many parsers do this on entity boundaries).您还应该知道,单个文本节点的内容可以通过对 text() 的一系列调用提供:解析器可以按照它喜欢的任何方式分解它(许多解析器在实体边界上这样做)。 You need to accumulate the content by appending to a buffer.您需要通过附加到缓冲区来积累内容。

Also note that XML is case-sensitive;另请注意 XML 区分大小写; you shouldn't really ignore case when testing element names.在测试元素名称时,您不应该真正忽略大小写。

And when asking questions on SO, it helps to get the terminology right: referring to elements as attributes is going to confuse people.在提出关于 SO 的问题时,正确使用术语会有所帮助:将元素称为属性会使人们感到困惑。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM