I have a string like this
"a a a a aaa b c d e f a g a aaa aa a a"
I want to turn it into either
"a b c d e f a g a"
or
"a b c d e f a g a "
(whichever's easier, it doesn't matter since it'll be HTML)
"a"
s are line breaks ( \\r\\n
), in case that changes anything.
Generally your code should be:
s.replace(new RegExp("(\\S)(?:\\s*\\1)+","g"), "$1");
Check this fiddle.
But, depends on what those characters a , b , c , ... represent in your case/question, you might need to change \\\\S
to other class, such as [^ ]
, and then \\\\s
to [ ]
, if you want to include \\r and \\n to being collapsed as well >>
s.replace(new RegExp("([^ ])(?:[ ]*\\1)+","g"), "$1");
Check this fiddle.
However if a is going to represent string \\r\\n , then you would need a little more complicated pattern >>
s.replace(new RegExp("(\\r\\n|\\S)(?:[^\\S\\r\\n]*\\1)+","g"), "$1");
Check this fiddle.
If I understand the problem correctly, the goal is to remove duplicate copies of a specific character/string, possibly separated by spaces. You can do that by replacing the regular expression (a\\s*)+
with a
; +
for multiple consecutive copies, a\\s*
for a
s followed by spaces How precisely you do that depends on the language: in Perl it's $str =~ s/(a\\s*)+/a /g
, in Ruby it's str.gsub(/(a\\s*)+/, "a ")
, and so on.
The fact that a
is actually \\r\\n
shouldn't complicate things, but might mean that the replacement would work better as s/(\\r\\n[ \\t]*)+/\\r\\n/g
(since \\s
overlaps with \\r
and \\n
).
If you need C# code and you want to collapse JUST \\r\\n strings with leading and trailing whitespaces, then the solution is pretty simple:
string result = Regex.Replace(input, @"\s*\r\n\s*", "\r\n");
Check this code here .
Went with this:
private string GetDescriptionFor(HtmlDocument document)
{
string description = CrawlUsingMetadata(XPath.ResourceDescription, document);
Regex regex = new Regex(@"(\r\n(?:[ ])*|\n(?:[ ])*){3,}", RegexOptions.Multiline | RegexOptions.IgnoreCase);//(?:[^\S\r\n|\n]*\1)+
string result = regex.Replace(description, "\n\n");
string decoded = HttpUtility.HtmlDecode(result);
return decoded;
}
It does, as it's supposed to, ignore all line breaks except cases where it matches three or more continuous line breaks, ignoring whitespace, and replaces those matches with \\n\\n
.
试试这个:
Regex.Replace(inputString, @"(\r\n\s+)", " ");
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.