简体   繁体   中英

Regular Expression to Replace Unwanted Letters

I wrote a small program in C# to Capture ingame Text. My issue is that the Text allso containts Collor Codes which i try to not to have. I read about the function Regex.Replace Which i think is going to suite for that.

I have Following String (Line) i want to clear i used the small little tool espresso to play a little bit with regular expression but i never figured it really out.

This is the String i am going to work with:

|c001177ffSave Code =|r |cff00AA00A|cff00AA00G|cff00AA00Q|cffff69b4g|r |cff00AA00R|cff40e0d09|cffffff00$|cffffff00#|r |cff40e0d04|cffff69b4f|cff00AA00R

I try to use ^|( [a-zA-Z0-9]{9})

which gave me theese matches c001177ff cff00AA00 cff00AA00 cff00AA00 cffff69b4 cff00AA00 cff40e0d0 cffffff00 cffffff00 cff40e0d0 cffff69b4 cff00AA00

Well i am not good at regex more likly i just started it. I don't want any body to present me completed solution (you are more than welcome to do that) at least a little help how i can solve that issue. I want to filter the Text.

Inpute Code

 |c001177ffSave Code =|r |cff00AA00A|cff00AA00G|cff00AA00Q|cffff69b4g|r |cff00AA00R|cff40e0d09|cffffff00$|cffffff00#|r |cff40e0d04|cffff69b4f|cff00AA00R

Should be Filtered to this

Save Code = AGQg R9$# 4fR

I think theese are Hexadecimal Color Codes the |c marks the beginning and the |r the End of the string.I think the |r | is just used to indicate that the first color string ends than we get an SPACE and the | indicates the next start.

How about a simple Linq?

var output = String.Join("", input.Split('|')
                             .Select(s => s.Length != 10 ? ' ' : s.Last()))
             .Trim();

So I think the problem you were having was not escaping your | ... the following regex works for me:

var replaced = Regex.Replace(intput, @"\|c[0-9a-zA-Z]{8}|\|r", "");
  • \\|c[0-9a-zA-Z]{8} - match starting with "|c" and then any 8 letters or numbers
  • | - or
  • \\|r - match "|r"

You're on the right track. Your regex

^|( [a-zA-Z0-9]{9})

Both forces the match to be only at the start of your input string, due to the ^ start-of-line anchor , and the | needs to be escaped, because unescaped, it's a special "or" operator , which completely changes the meaning of your regex.

In addition, the space after the | is undesired, and the capture group is unnecessary, as you only want to eliminate this portion.

If you replace all instances of this

\|[a-zA-z0-9]{9}

with nothing (the empty string)

You will achieve most of your goal. Try it here: http://regex101.com/r/rF6yB6/1

But it seems you really want to eliminate not just nine characters after the pipe, but up through nine characters. So use the {1,9} range quantifier instead:

\|[a-zA-z0-9]{1,9}

Try it: http://regex101.com/r/rF6yB6/2

This seems to achieve your goal exactly.


Please consider bookmarking the Stack Overflow Regular Expressions FAQ for future reference.

string input = "[The example input from your question]";
string output = input.Replace("|r", "");
while (output.Contains("|c"))
    output = output.Remove(output.IndexOf("|c"), 10);
// output = "Save Code = AGQg R9$# 4fR"

I like this much more than using Regexes just because it's so much more clear to me.

var str1 = "|c001177ffSave Code =|r |cff00AA00A|cff00AA00G|cff00AA00Q|cffff69b4g|r |cff00AA00R|cff40e0d09|cffffff00$|cffffff00#|r |cff40e0d04|cffff69b4f|cff00AA00R"
var str2 = Regex.Replace(str,@"\|(r|[a-zA-Z0-9]{9})","") //"Save Code = AGQg R9$# 4fR"

In addition to this answer re: escaping the "pipe" character , you're starting your regex with the caret ( ^ ) character. This matches the beginning of a line.

A correct regex would be:

\|c[0-9a-zA-Z]{8}

This regex should match all of the characters you want to remove:

([|]c([0-9]|[a-f]|[A-F]){8})|[|]r

Here's the breakdown...

The vertical pipe is an OR marker, so to search for it, place it in square brackets [ and ].

The parenthesis makes a set. So you're searching for ([|]c([0-9]|[af]|[AF]){8}) OR [|]r which is all of your color codes OR |r.

Breakdown of the color codes is the set that begins with |c and is followed by the set of exactly 8 characters that can be 0 though 9 or a through f or A through F.

I tested it at RegexPal.com.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM