简体   繁体   English

正则表达式替换不需要的字母

[英]Regular Expression to Replace Unwanted Letters

I wrote a small program in C# to Capture ingame Text. 我用C#编写了一个小程序来捕获游戏中的文本。 My issue is that the Text allso containts Collor Codes which i try to not to have. 我的问题是Text allso包含我尝试不使用的Collor代码。 I read about the function Regex.Replace Which i think is going to suite for that. 我读到有关Regex.Replace函数的信息,我认为这将适合于此

I have Following String (Line) i want to clear i used the small little tool espresso to play a little bit with regular expression but i never figured it really out. 我有“跟随字符串(行)”,我想清除一下,我用小的工具意式浓缩咖啡用正则表达式演奏了一下,但我从未真正弄清楚。

This is the String i am going to work with: 这是我要使用的字符串:

|c001177ffSave Code =|r |cff00AA00A|cff00AA00G|cff00AA00Q|cffff69b4g|r |cff00AA00R|cff40e0d09|cffffff00$|cffffff00#|r |cff40e0d04|cffff69b4f|cff00AA00R

I try to use ^|( [a-zA-Z0-9]{9}) 我尝试使用^|( [a-zA-Z0-9]{9})

which gave me theese matches c001177ff cff00AA00 cff00AA00 cff00AA00 cffff69b4 cff00AA00 cff40e0d0 cffffff00 cffffff00 cff40e0d0 cffff69b4 cff00AA00 这给了我这些人匹配的c001177ff cff00AA00 cff00AA00 cff00AA00 cffff69b4 cff00AA00 cff40e0d0 cffffff00 cffffff00 cff40e0d0 cffff69b4 cff00AA00

Well i am not good at regex more likly i just started it. 好吧,我刚开始就不太擅长正则表达式。 I don't want any body to present me completed solution (you are more than welcome to do that) at least a little help how i can solve that issue. 我不希望任何人向我介绍完整的解决方案(非常欢迎您这样做),至少我没有什么可以解决该问题的帮助。 I want to filter the Text. 我想过滤文本。

Inpute Code 输入代码

 |c001177ffSave Code =|r |cff00AA00A|cff00AA00G|cff00AA00Q|cffff69b4g|r |cff00AA00R|cff40e0d09|cffffff00$|cffffff00#|r |cff40e0d04|cffff69b4f|cff00AA00R

Should be Filtered to this 应该过滤到这个

Save Code = AGQg R9$# 4fR

I think theese are Hexadecimal Color Codes the |c marks the beginning and the |r the End of the string.I think the |r | 我认为这些是十六进制颜色代码,| c表示字符串的开头,| r表示字符串的结尾。 is just used to indicate that the first color string ends than we get an SPACE and the | 只是用来表示第一个颜色字符串比我们得到的SPACE和|结束。 indicates the next start. 表示下一次开始。

How about a simple Linq? 简单的Linq怎么样?

var output = String.Join("", input.Split('|')
                             .Select(s => s.Length != 10 ? ' ' : s.Last()))
             .Trim();

So I think the problem you were having was not escaping your | 因此,我认为您遇到的问题没有使您逃脱| ... the following regex works for me: ...以下正则表达式适用于我:

var replaced = Regex.Replace(intput, @"\|c[0-9a-zA-Z]{8}|\|r", "");
  • \\|c[0-9a-zA-Z]{8} - match starting with "|c" and then any 8 letters or numbers \\|c[0-9a-zA-Z]{8} -匹配以"|c"开头,然后是任意8个字母或数字
  • | - or - 要么
  • \\|r - match "|r" \\|r匹配"|r"

You're on the right track. 您走在正确的轨道上。 Your regex 您的正则表达式

^|( [a-zA-Z0-9]{9})

Both forces the match to be only at the start of your input string, due to the ^ start-of-line anchor , and the | 由于^ start-of-line anchor| ,两者都强制匹配仅在输入字符串的开头 | needs to be escaped, because unescaped, it's a special "or" operator , which completely changes the meaning of your regex. 需要转义,因为不转义,它是一个特殊的“或”运算符 ,它完全改变了正则表达式的含义。

In addition, the space after the | 另外, |后的空格 is undesired, and the capture group is unnecessary, as you only want to eliminate this portion. 是不希望的,而捕获组是不必要的,因为您只想删除此部分。

If you replace all instances of this 如果替换此所有实例

\|[a-zA-z0-9]{9}

with nothing (the empty string) 一无所有 (空字符串)

You will achieve most of your goal. 您将实现大部分目标。 Try it here: http://regex101.com/r/rF6yB6/1 在这里尝试: http : //regex101.com/r/rF6yB6/1

But it seems you really want to eliminate not just nine characters after the pipe, but up through nine characters. 但是似乎您真的希望不仅在管道后面消除9个字符,而且还要消除9 字符。 So use the {1,9} range quantifier instead: 因此,请使用{1,9}范围量词

\|[a-zA-z0-9]{1,9}

Try it: http://regex101.com/r/rF6yB6/2 试试看: http : //regex101.com/r/rF6yB6/2

This seems to achieve your goal exactly. 这似乎完全可以实现您的目标。


Please consider bookmarking the Stack Overflow Regular Expressions FAQ for future reference. 请考虑将“ 堆栈溢出正则表达式” FAQ标记为书签,以备将来参考。

string input = "[The example input from your question]";
string output = input.Replace("|r", "");
while (output.Contains("|c"))
    output = output.Remove(output.IndexOf("|c"), 10);
// output = "Save Code = AGQg R9$# 4fR"

I like this much more than using Regexes just because it's so much more clear to me. 我比使用Regexes更喜欢这一点,因为对我而言,它是如此清晰。

var str1 = "|c001177ffSave Code =|r |cff00AA00A|cff00AA00G|cff00AA00Q|cffff69b4g|r |cff00AA00R|cff40e0d09|cffffff00$|cffffff00#|r |cff40e0d04|cffff69b4f|cff00AA00R"
var str2 = Regex.Replace(str,@"\|(r|[a-zA-Z0-9]{9})","") //"Save Code = AGQg R9$# 4fR"

In addition to this answer re: escaping the "pipe" character , you're starting your regex with the caret ( ^ ) character. 除了这个答案re:转义“ pipe”字符之外 ,您还使用插入符号( ^ )来启动正则表达式。 This matches the beginning of a line. 这匹配行的开头。

A correct regex would be: 正确的正则表达式为:

\|c[0-9a-zA-Z]{8}

This regex should match all of the characters you want to remove: 此正则表达式应与您要删除的所有字符匹配:

([|]c([0-9]|[a-f]|[A-F]){8})|[|]r

Here's the breakdown... 这是细分...

The vertical pipe is an OR marker, so to search for it, place it in square brackets [ and ]. 垂直管道是一个OR标记,因此要搜索它,请将其放在方括号[和]中。

The parenthesis makes a set. 括号进行设置。 So you're searching for ([|]c([0-9]|[af]|[AF]){8}) OR [|]r which is all of your color codes OR |r. 因此,您要搜索([|] c([0-9] | [af] | [AF]){8})OR [|] r,它是所有颜色代码OR | r。

Breakdown of the color codes is the set that begins with |c and is followed by the set of exactly 8 characters that can be 0 though 9 or a through f or A through F. 颜色代码的分解是一个以| c开头的集合,其后是正好是8个字符的集合,这些字符可以是0到9或a到f或A到F。

I tested it at RegexPal.com. 我在RegexPal.com上进行了测试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM