简体   繁体   English

使用 java 从字符串中删除多余的空格和不可见字符

[英]remove extra white space and invisible chars from string using java

I have a big conversation, I am handling it as String in between the string there are many white spaces may be invisible non word characters also.我有一个很大的对话,我正在处理它作为字符串,在字符串之间有很多空格也可能是不可见的非单词字符。 Below is an example string:下面是一个示例字符串:

public static void main(String[] args) {
  String str = " TWD day count Spot                              6-Sep / 2-Sep 2016 1W7d                        13-Sep / 9-Sep 2016 1M30d                      6-Oct / 4-Oct 2016 2M62d                      7-Nov / 3-Nov 2016 3M91d                      6-Dec / 2-Dec 2016 6M181d                    6-Mar / 2-Mar 2017 9M273d                    6-Jun / 2-Jun 2017 12M365d                  6-Sep / 4-Sep 2017 18M546d                  6-Mar / 2-Mar 2018 24M730d                  6-Sep / 4-Sep 2018";
  str = str.toString().replaceAll(" +", "");
  System.out.println("str="+str.toString().trim().replaceAll(" ", ""));
}

I tried many string functions to remove white spaces like trim() , replaceAll(" ","") , replaceAll("\\s","") , replaceAll(" +","") , replaceAll("\\s\ ","") , stringUtils.normalize() function etc. Many I tried but not working as expected.我尝试了许多字符串函数来删除空格,例如trim()replaceAll(" ","")replaceAll("\\s","")replaceAll(" +","")replaceAll("\\s\ ","") , stringUtils.normalize() function 等。我尝试了很多但没有按预期工作。

I am expecting the output as below:我期待 output 如下:

"String str = " TWD day count Spot 6-Sep / 2-Sep 2016 1W7d 13-Sep / 9-Sep 2016 1M30d 6-Oct / 4-Oct 2016 2M62d 7-Nov / 3-Nov 2016 3M91d 6-Dec / 2-Dec 2016 6M181d 6-Mar / 2-Mar 2017 9M273d " "String str = " TWD day count Spot 6-Sep / 2-Sep 2016 1W7d 13-Sep / 9-Sep 2016 1M30d 6-Oct / 4-Oct 2016 2M62d 7-Nov / 3-Nov 2016 3M91d 6-Dec / 2 -2016 年 12 月 6M181d 2017 年 3 月 6 日/3 月 2 日 9M273d“

Just one space instead of long white duplicate spaces.只有一个空格而不是长长的白色重复空格。

Please help.请帮忙。

Found the answer as below:找到答案如下:

System.out.println("str="+str.replaceAll("(?U)\\s+", " "));

If you have non-standard spaces in your text, such as characters from Unicode categories: 如果您的文本中包含非标准空格,例如Unicode类别中的字符:

use this: 用这个:

str = str.replaceAll("[\\s\\p{Z}]+", " ").trim();

where \\s matches whitespace characters ( [ \\t\\n\\x0B\\f\\r] ), and \\p{Z} is shorthand for \\p{Zs}\\p{Zp}\\p{Zl} as listed above. 其中\\s匹配空格字符( [ \\t\\n\\x0B\\f\\r] ),而\\p{Z}\\p{Zs}\\p{Zp}\\p{Zl}简写形式。

It will basically replace all whitespace and separator characters into spaces, collapse consecutive spaces into a single space, and remove leading and trailing spaces. 它将基本上将所有空白分隔符替换为空格,将连续的空格折叠为单个空格,并删除前导和尾随空格。

public static void main(String []args){
    String str = " TWD day count Spot                              6-Sep / 2-Sep 2016 1W7d                        13-Sep / 9-Sep 2016 1M30d                      6-Oct / 4-Oct 2016 2M62d                      7-Nov / 3-Nov 2016 3M91d                      6-Dec / 2-Dec 2016 6M181d                    6-Mar / 2-Mar 2017 9M273d                    6-Jun / 2-Jun 2017 12M365d                  6-Sep / 4-Sep 2017 18M546d                  6-Mar / 2-Mar 2018 24M730d                  6-Sep / 4-Sep 2018";
    str = str.replaceAll("\\s+", " ");
    System.out.println(str);
}

Output: 输出:

TWD day count Spot 6-Sep / 2-Sep 2016 1W7d 13-Sep / 9-Sep 2016 1M30d 6-Oct / 4-Oct 2016 2M62d 7-Nov / 3-Nov 2016 3M91d 6-Dec / 2-Dec 2016 6M181d 6-Mar / 2-Mar 2017 9M273d 6-Jun / 2-Jun 2017 12M365d 6-Sep / 4-Sep 2017 18M546d 6-Mar / 2-Mar 2018 24M730d 6-Sep / 4-Sep 2018

use StringUtils.normalizeSpace(str);使用StringUtils.normalizeSpace(str);

Output: Output:

TWD day count Spot 6-Sep / 2-Sep 2016 1W7d 13-Sep / 9-Sep 2016 1M30d 6-Oct / 4-Oct 2016 2M62d 7-Nov / 3-Nov 2016 3M91d 6-Dec / 2-Dec 2016 6M181d 6-Mar / 2-Mar 2017 9M273d 6-Jun / 2-Jun 2017 12M365d 6-Sep / 4-Sep 2017 18M546d 6-Mar / 2-Mar 2018 24M730d 6-Sep / 4-Sep 2018

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM