I've been looking online and trying to understand. I am parsing some html files that are encoded in iso-8859-1. Once parsed I want all the output to be in the standard java encoding (utf-something)
Here is how I do this:
currentDocument = Jsoup.parse(new File("thing.htm", "ISO-8859-1");
Element elt = currentDocument.getElementById("bim");
String title = elt.select("h1,h2,h3,h4,h5,h6").first().text();
System.out.println(title);
The string in the file is:
G18 Legemiddeløkonomi – pasientens venn eller fiende
The output is:
G18?Legemiddel?konomi ? pasientens venn eller fiende
I guess I'm doing something wrong somewhere as I know this is possible with Jsoup I just don't really know what it is. Btw I'm on MacOSX. Can somebody help me?
Thx
Ok so after investigating further and thanks to @Esailija I found that my console wasn't outputing in UTF-8 which was solved by:
PrintStream stdout = new PrintStream(System.out, true, "UTF-8");
System.setOut(stdout);
I also used: currentDocument.outputSettings().charset("UTF-8");
but I am not sure this is useful.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.