繁体   English   中英

如何用Java读写非英语字符(特殊字符如Marathi,Tamil,Hindi等)?

[英]How to read and write the non-English characters (special character say Marathi, Tamil, Hindi etc) in Java?

要从Excel文件中读取非英语字符,需要读取Marathi语言,然后再将该语言写入XML文件。 当我从Excel中读取此Marathi语言并使用Java代码进行检查时,它恰好显示了Marathi语言,但是当我通过Java代码将其编写为XML时,在阅读之后,我得到了一些与该Marathi语言相对应的符号。 所以请建议我如何处理这种情况。 请找到相同的附件代码。

public void excelToXML(String path) {

        FileWriter fostream;

        PrintWriter out = null;

        String strOutputPath = "C:\\Temp\\";

        try {

            File file = new File(path);

            InputStream inputStream = new FileInputStream(file);

            Workbook wb = WorkbookFactory.create(inputStream);

            List<String> sheetNames = new ArrayList<String>();

            for (int i = 0; i < wb.getNumberOfSheets(); i++) {

                sheetNames.add(wb.getSheetName(i));

            }

            fostream = new FileWriter(strOutputPath + "\\" + "iTicker" + ".xml");

            out = new PrintWriter(new BufferedWriter(fostream));

            // out.println("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");

            out.println("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>");

            out.println("<root xmlns:xsi=\"http://www.w3.org/3921/XMLSchema-instance\">");

            for (String sheetName : sheetNames) {
                if(sheetName.equals("Sheet3")){
                    System.out.println(sheetName);
                    break;
                }


                Sheet sheet = wb.getSheet(sheetName);

                boolean firstRow = true;

                ArrayList<String> myStringArray = new ArrayList<String>();

                Iterator<Cell> cells = sheet.getRow(0).cellIterator();

                while (cells.hasNext()) {

                    myStringArray.add(cells.next().toString());

                }

                for (Row row : sheet) {

                    if (firstRow == true) {
                        firstRow = false;
                        continue;
                    }

                    if (!sheetName.equals("Sheet1")) {
                        out.println("\t<element>");
                    }

                    for (int i = 0; i < myStringArray.size(); i++) {
                        if (row.getCell(i) != null && !(row.getCell(i)).toString().isEmpty()
                                && row.getCell(i).toString().length() > 0) {
                            if(!(myStringArray.get(i) != null && myStringArray.get(i).toString().equals("Start_Epoch_Time") || myStringArray.get(i).toString().equals("End_Epoch_Time"))){
                            out.println(formatElement("\t\t", myStringArray.get(i), formatCell(row.getCell(i))));
                            } else{
                                long ePochValue=EpochConverter.getepochValue(row.getCell(i).toString());
                                out.println(formatElement("\t\t", myStringArray.get(i), String.valueOf(ePochValue)));
                            }
                        } else {
                            blankValues.add(sheetName +":" + "column header" +":" +myStringArray.get(i)+":"+"row no:"+row.getRowNum()+" " +"is blank.");
                        }
                    }

                    if (!sheetName.equals("Sheet1")) {
                        out.println("\t</element>");
                    }

                }
            }
            out.write("</root>");

            out.flush();

            out.close();
            if(blankValues != null && blankValues.size() >0){
            FileUploadController.writeErrorLog(blankValues + "Please fill all the mandatory values.");
            }

        } catch (Exception e) {
            new DTHException(e.getMessage());
            e.printStackTrace();

        }

    }

    private static String formatCell(Cell cell)

    {
        if (cell == null) {
            return "";
        }

        switch (cell.getCellType()) {

        case Cell.CELL_TYPE_BLANK:

            return "";

        case Cell.CELL_TYPE_BOOLEAN:

            return Boolean.toString(cell.getBooleanCellValue());

        case Cell.CELL_TYPE_ERROR:

            return "*error*";

        case Cell.CELL_TYPE_NUMERIC:

            return df.format(cell.getNumericCellValue());

        case Cell.CELL_TYPE_STRING:

            return cell.getStringCellValue();

        default:

            return "<unknown value>";

        }

    }

    private static String formatElement(String prefix, String tag, String value) {

        StringBuilder sb = new StringBuilder(prefix);
        sb.append("<");

        sb.append(tag);

        if (value != null && value.length() > 0) {

            sb.append(">");

            sb.append(value);

            sb.append("</");

            sb.append(tag);

            sb.append(">");

        } else {

            sb.append("/>");

        }
        return sb.toString();

    }

在下面的行中,当检查该row.getCell(i)值时,我获得了确切的Marathi值,但是在写入此值之后,得到了不同的输出。

out.println(formatElement(“ \\ t \\ t”,myStringArray.get(i),formatCell(row.getCell(i))));;

您的代码有两个大问题。

1)您显然正在使用Windows(路径C:\\\\Temp ),但是-正如注释中已经提到的Axel Richter-您正在使用输出文件的默认编码。 直接使用文件名创建FileWriter会为您提供平台的默认编码,即Windows的Windows ANSI。 这不是您想要的,因为稍后您将使用UTF-8作为编码来编写XML标头声明。

您永远不应依赖平台的默认编码。 始终通过OutputStreamWriterFileOutputStream以显式编码创建PrintWriter,如下所示:

PrintWriter writer = new PrintWriter(new BufferedWriter(
    new OutputStreamWriter(
      new FileOutputStream("iTicker.xml"), StandardCharsets.UTF_8)));

2)不好的做法是像您一样手动编写XML。 如果这样做,则应注意特殊字符,例如“ <”,““>”和“&”。 始终建议为此使用库,该库会自动转义。 Java标准库的一部分是例如接口XMLStreamWriter的实现。

下面是一个简单易用的示例:

import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;

import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamWriter;

public class WriteXml {

  public static void main(String[] args) {
    try {
      File outFile = new File("iTicker.xml");
      // Outputstream for the XML document. The XMLStreamWriter should take care of the right encoding.
      OutputStream out = new BufferedOutputStream(new FileOutputStream(outFile));

      XMLStreamWriter xmlWriter = 
        XMLOutputFactory.newInstance().createXMLStreamWriter(out);
      xmlWriter.writeStartDocument("UTF-8", "1.0");
      xmlWriter.writeCharacters("\n");
      xmlWriter.writeStartElement("root");
      xmlWriter.writeNamespace("xsi", "http://www.w3.org/3921/XMLSchema-instance");     

      xmlWriter.writeCharacters("\n  ");
      xmlWriter.writeStartElement("element");
      // Some special characters and (I hope) some Marathi letters
      xmlWriter.writeCharacters("<>&\": मराठी वर्णमाला"); 
      xmlWriter.writeEndElement(); // element

      xmlWriter.writeCharacters("\n");
      xmlWriter.writeEndElement(); // root
      xmlWriter.writeEndDocument();
      xmlWriter.close(); // should be better in a finally block
      out.close(); // should be better handled automatically by try-with-resources
    } catch(Exception e) {
      e.printStackTrace();
    }
  }

}

这将创建以下XML:

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:xsi="http://www.w3.org/3921/XMLSchema-instance">
  <element>&lt;&gt;&amp;": मराठी वर्णमाला</element>
</root>

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM