[英]How to read and write the non-English characters (special character say Marathi, Tamil, Hindi etc) in Java?
要从Excel文件中读取非英语字符,需要读取Marathi语言,然后再将该语言写入XML文件。 当我从Excel中读取此Marathi语言并使用Java代码进行检查时,它恰好显示了Marathi语言,但是当我通过Java代码将其编写为XML时,在阅读之后,我得到了一些与该Marathi语言相对应的符号。 所以请建议我如何处理这种情况。 请找到相同的附件代码。
public void excelToXML(String path) {
FileWriter fostream;
PrintWriter out = null;
String strOutputPath = "C:\\Temp\\";
try {
File file = new File(path);
InputStream inputStream = new FileInputStream(file);
Workbook wb = WorkbookFactory.create(inputStream);
List<String> sheetNames = new ArrayList<String>();
for (int i = 0; i < wb.getNumberOfSheets(); i++) {
sheetNames.add(wb.getSheetName(i));
}
fostream = new FileWriter(strOutputPath + "\\" + "iTicker" + ".xml");
out = new PrintWriter(new BufferedWriter(fostream));
// out.println("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
out.println("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>");
out.println("<root xmlns:xsi=\"http://www.w3.org/3921/XMLSchema-instance\">");
for (String sheetName : sheetNames) {
if(sheetName.equals("Sheet3")){
System.out.println(sheetName);
break;
}
Sheet sheet = wb.getSheet(sheetName);
boolean firstRow = true;
ArrayList<String> myStringArray = new ArrayList<String>();
Iterator<Cell> cells = sheet.getRow(0).cellIterator();
while (cells.hasNext()) {
myStringArray.add(cells.next().toString());
}
for (Row row : sheet) {
if (firstRow == true) {
firstRow = false;
continue;
}
if (!sheetName.equals("Sheet1")) {
out.println("\t<element>");
}
for (int i = 0; i < myStringArray.size(); i++) {
if (row.getCell(i) != null && !(row.getCell(i)).toString().isEmpty()
&& row.getCell(i).toString().length() > 0) {
if(!(myStringArray.get(i) != null && myStringArray.get(i).toString().equals("Start_Epoch_Time") || myStringArray.get(i).toString().equals("End_Epoch_Time"))){
out.println(formatElement("\t\t", myStringArray.get(i), formatCell(row.getCell(i))));
} else{
long ePochValue=EpochConverter.getepochValue(row.getCell(i).toString());
out.println(formatElement("\t\t", myStringArray.get(i), String.valueOf(ePochValue)));
}
} else {
blankValues.add(sheetName +":" + "column header" +":" +myStringArray.get(i)+":"+"row no:"+row.getRowNum()+" " +"is blank.");
}
}
if (!sheetName.equals("Sheet1")) {
out.println("\t</element>");
}
}
}
out.write("</root>");
out.flush();
out.close();
if(blankValues != null && blankValues.size() >0){
FileUploadController.writeErrorLog(blankValues + "Please fill all the mandatory values.");
}
} catch (Exception e) {
new DTHException(e.getMessage());
e.printStackTrace();
}
}
private static String formatCell(Cell cell)
{
if (cell == null) {
return "";
}
switch (cell.getCellType()) {
case Cell.CELL_TYPE_BLANK:
return "";
case Cell.CELL_TYPE_BOOLEAN:
return Boolean.toString(cell.getBooleanCellValue());
case Cell.CELL_TYPE_ERROR:
return "*error*";
case Cell.CELL_TYPE_NUMERIC:
return df.format(cell.getNumericCellValue());
case Cell.CELL_TYPE_STRING:
return cell.getStringCellValue();
default:
return "<unknown value>";
}
}
private static String formatElement(String prefix, String tag, String value) {
StringBuilder sb = new StringBuilder(prefix);
sb.append("<");
sb.append(tag);
if (value != null && value.length() > 0) {
sb.append(">");
sb.append(value);
sb.append("</");
sb.append(tag);
sb.append(">");
} else {
sb.append("/>");
}
return sb.toString();
}
在下面的行中,当检查该row.getCell(i)值时,我获得了确切的Marathi值,但是在写入此值之后,得到了不同的输出。
out.println(formatElement(“ \\ t \\ t”,myStringArray.get(i),formatCell(row.getCell(i))));;
您的代码有两个大问题。
1)您显然正在使用Windows(路径C:\\\\Temp
),但是-正如注释中已经提到的Axel Richter-您正在使用输出文件的默认编码。 直接使用文件名创建FileWriter
会为您提供平台的默认编码,即Windows的Windows ANSI。 这不是您想要的,因为稍后您将使用UTF-8作为编码来编写XML标头声明。
您永远不应依赖平台的默认编码。 始终通过OutputStreamWriter
和FileOutputStream
以显式编码创建PrintWriter,如下所示:
PrintWriter writer = new PrintWriter(new BufferedWriter(
new OutputStreamWriter(
new FileOutputStream("iTicker.xml"), StandardCharsets.UTF_8)));
2)不好的做法是像您一样手动编写XML。 如果这样做,则应注意特殊字符,例如“ <”,““>”和“&”。 始终建议为此使用库,该库会自动转义。 Java标准库的一部分是例如接口XMLStreamWriter
的实现。
下面是一个简单易用的示例:
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamWriter;
public class WriteXml {
public static void main(String[] args) {
try {
File outFile = new File("iTicker.xml");
// Outputstream for the XML document. The XMLStreamWriter should take care of the right encoding.
OutputStream out = new BufferedOutputStream(new FileOutputStream(outFile));
XMLStreamWriter xmlWriter =
XMLOutputFactory.newInstance().createXMLStreamWriter(out);
xmlWriter.writeStartDocument("UTF-8", "1.0");
xmlWriter.writeCharacters("\n");
xmlWriter.writeStartElement("root");
xmlWriter.writeNamespace("xsi", "http://www.w3.org/3921/XMLSchema-instance");
xmlWriter.writeCharacters("\n ");
xmlWriter.writeStartElement("element");
// Some special characters and (I hope) some Marathi letters
xmlWriter.writeCharacters("<>&\": मराठी वर्णमाला");
xmlWriter.writeEndElement(); // element
xmlWriter.writeCharacters("\n");
xmlWriter.writeEndElement(); // root
xmlWriter.writeEndDocument();
xmlWriter.close(); // should be better in a finally block
out.close(); // should be better handled automatically by try-with-resources
} catch(Exception e) {
e.printStackTrace();
}
}
}
这将创建以下XML:
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:xsi="http://www.w3.org/3921/XMLSchema-instance">
<element><>&": मराठी वर्णमाला</element>
</root>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.