I'm trying to remove hard spaces (from
entities in the HTML). I can't remove it with .trim()
or .replace(" ", "")
, etc! I don't get it.
I even found on Stackoverflow to try with \\\
but didn't work neither.
I tried this (since text()
returns actual hard space characters, U+00A0 ):
System.out.println( "'"+fields.get(6).text().replace("\\u00a0", "")+"'" ); //'94,00 '
System.out.println( "'"+fields.get(6).text().replace(" ", "")+"'" ); //'94,00 '
System.out.println( "'"+fields.get(6).text().trim()+"'"); //'94,00 '
System.out.println( "'"+fields.get(6).html().replace(" ", "")+"'"); //'94,00' works
But I can't figure out why I can't remove the white space with .text()
.
Your first attempt was very nearly it, you're quite right that Jsoup maps
to U+00A0. You just don't want the double backslash in your string:
System.out.println( "'"+fields.get(6).text().replace("\u00a0", "")+"'" ); //'94,00'
// Just one ------------------------------------------^
replace
doesn't use regular expressions, so you aren't trying to pass a literal backslash through to the regex level. You just want to specify character U+00A0 in the string.
The question has been edited to reflect the true problem.
New answer; The hardspace, ie. entity (Unicode character NO-BREAK SPACE U+00A0 ) can in Java be represented by the character \ ,
thus code becomes, where str
is the string gotten from the text()
method
str.replaceAll ("\u00a0", "");
Old answer; Using the JSoup library,
import org.jsoup.parser.Parser;
String str1 = Parser.unescapeEntities("last week, Ovokerie Ogbeta", false);
String str2 = Parser.unescapeEntities("Entered » Here", false);
System.out.println(str1 + " " + str2);
Prints out:
last week, Ovokerie Ogbeta Entered » Here
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.