简体   繁体   English

替换匹配的正则表达式的 substring

[英]replace substring of matched regex

I fetch some html and do some string manipulation and en up with a string like我获取了一些 html 并进行了一些字符串操作并得到了一个字符串

string sample = "\n    \n   2 \n      \n  \ndl. \n \n    \n flour\n\n     \n 4   \n    \n cups of    \n\nsugar\n"

I would like to find all ingredient lines and remove whitespaces and linebreaks我想找到所有成分行并删除空格和换行符

2 dl. 2分升。 flour and 4 cups of sugar面粉4杯糖

My approach so far is to the following.到目前为止,我的方法如下。

Pattern p = Pattern.compile("[\\d]+[\\s\\w\\.]+");
Matcher m = p.matcher(Result);

while(m.find()) {
  // This is where i need help to remove those pesky whitespaces
}

sample = sample.replaceAll("[\\n ]+", " ").trim();

Output: Output:

2 dl. flour 4 cups of sugar

With no spaces in the beginning, and no spaces at the end.开头没有空格,结尾也没有空格。

It first replaces all spaces and newlines with a single space, and then trims of the extra space from the begging / end.它首先用一个空格替换所有空格和换行符,然后从 begging / end 修剪多余的空格。

Following code should work for you:以下代码应该适合您:

String sample = "\n    \n   2 \n      \n  \ndl. \n \n    \n flour\n\n     \n 4   \n    \n cups of    \n\nsugar\n";
Pattern p = Pattern.compile("(\\s+)");
Matcher m = p.matcher(sample);
sb = new StringBuffer();
while(m.find())
    m.appendReplacement(sb, " ");
m.appendTail(sb);
System.out.println("Final: [" + sb.toString().trim() + ']');

OUTPUT OUTPUT

Final: [2 dl. flour 4 cups of sugar]

I think something like this will work for you:我认为这样的事情对你有用:

String test = "\n    \n   2 \n      \n  \ndl. \n \n    \n flour\n\n     \n 4   \n    \n cups of    \n\nsugar\n";

/* convert all sequences of whitespace into a single space, and trim the ends */
test = test.replaceAll("\\s+", " ");

I assumed that the \n are not actual line feed, but it also works with linefeeds .我假设\n不是实际的换行符,但它也适用于linefeeds This should work fine:这应该可以正常工作:

test=test.replaceAll ("(?:\\s|\\\n)+"," ");

In case there is no textual \n it can be simpler:如果没有textual \n它可以更简单:

test=test.replaceAll ("\\s+"," ");

An you need to trim the leading/trailing spaces.您需要修剪前导/尾随空格。

I use the RegexBuddy tool to check any single regex, very handy in so many languages.我使用 RegexBuddy 工具检查任何单个正则表达式,在这么多语言中非常方便。

You should be able to use the standard String.replaceAll(String, String) .您应该能够使用标准String.replaceAll(String, String) The first parameter will take your pattern, the second will take an empty string.第一个参数将采用您的模式,第二个参数将采用空字符串。

s/^\s+//s
s/\s+$//s
s/(\s+)/ /s

Run those three substitutions (replacing leading whitespace with nothing, replace trailing whitespace with nothing, replace multiple whitespace with a space.运行这三个替换(用空替换前导空格,用空替换尾随空格,用空格替换多个空格。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM