简体   繁体   English

如何删除字符串中的重复字符

[英]How do you remove repeated characters in a string

I have a website which allows users to comment on photos. 我有一个允许用户对照片发表评论的网站。 Of course, users leave comments like: 当然,用户会留下如下评论:

'OMGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG!!!!!!!!!!!!!!!' ''OMGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG !!!!!!!!!!!!!!!'

or 要么

'YOU SUCCCCCCCCCCCCCCCCCKKKKKKKKKKKKKKKKKK' 'YOU SUCCCCCCCCCCCCCCCCCCCKKKKKKKKKKKKKKKKKKKK'

You get it. 你懂了。

Basically, I want to shorten those comments by removing at least most of those excess repeated characters. 基本上,我想通过至少删除那些多余的重复字符来缩短这些注释。 I'm sure there's a way to do it with Regex..i just can't figure it out. 我敢肯定有一种方法可以用Regex ..我只是想不通。

Any ideas? 有任何想法吗?

Keeping in mind that the English language uses double letters often you probably don't want to blindly eliminate them. 请记住,英语经常使用双字母,您可能不想盲目地消除它们。 Here is a regex that will get rid of anything beyond a double. 这是一个正则表达式,它将删除除double以外的任何内容。

Regex r = new Regex("(.)(?<=\\1\\1\\1)", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant | RegexOptions.Compiled);

var x = r.Replace("YOU SUCCCCCCCCCCCCCCCCCKKKKKKKKKKKKKKKKKK", String.Empty);
// x = "YOU SUCCKK"

var y = r.Replace("OMGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG!!!!!!!!!!!!!!!", String.Empty);
// y = "OMGG!!"

Do you specifically want to shorten the strings in the code, or would it be enough to simply fail validation and present the form to the user again with a validation error? 您是专门要缩短代码中的字符串,还是仅仅需要通过验证失败并通过验证错误再次向用户展示表单就足够了? Something like "Too many repeated characters." 诸如“重复的字符太多”之类的东西。

If the latter is acceptable, @"(\\w)\\1{2}" should match characters of 3 or more (interpreted as "repeated" two or more times). 如果后者是可接受的,则@"(\\w)\\1{2}"应该匹配3个或更多字符(两次或多次被解释为“重复”)。

Edit: As @Piskvor pointed out, this will match on exactly 3 characters. 编辑:正如@Piskvor指出的,这将恰好匹配3个字符。 It works fine for matching, but not for replacing. 它适用于匹配,但不适用于替换。 His version, @"(\\w)\\1{2,}" , would work better for replacing. 他的版本@"(\\w)\\1{2,}"可以更好地替换。 However, I'd like to point out that I think replacing wouldn't be the best practice here. 但是,我想指出的是,我认为更换并不是最好的做法。 Better to just have the form fail validation than to try to scrub the text being submitted, because there likely will be edge cases where you turn otherwise readable (even if unreasonable) text into nonsense. 与尝试清除正在提交的文本相比,仅使表单具有失败验证更好,因为在某些极端情况下,您会将原本可读的(即使不合理的)文本变成废话。

Regex would be overkill. 正则表达式会显得过分杀伤力。 Try this: 尝试这个:

public static string RemoveRepeatedChars(String input, int maxRepeat)
    {
        if(input.Length==0)return input;

        StringBuilder b = new StringBuilder;
        Char[] chars = input.ToCharArray();
        Char lastChar = chars[0];
        int repeat = 0;
        for(int i=1;i<input.Length;i++){
            if(chars[i]==lastChar && ++repeat<maxRepeat)
            {
                b.Append(chars[i]);
            }
            else
            {
                b.Append(chars[i]);
                repeat=0;
                lastChar = chars[i];
            }
        }
        return b.ToString();
    }
var nonRepeatedChars = myString.ToCharArray().Distinct().Where(c => !char.IsWhiteSpace(c) || !myString.Contains(c)).ToString();

Edit : awful suggestion, please don't read, I truly deserve my -1 :) 编辑:糟糕的建议,请不要阅读,我的确是我的-1 :)

I found here on technical nuggets something like what you're looking for. 我在这里在技术块上找到了您想要的东西。

There's nothing to do except a very long regex, because I've never heard about a regex sign for repetition ... 除了很长的正则表达式外,什么也没做,因为我从未听说过重复的正则表达式标志...

It's a total example, I won't paste it here but I think this will totally answer your question. 这是一个完整的示例,我不会在这里粘贴它,但是我认为这完全可以回答您的问题。

Distinct() will remove all duplicates, however it will not see "A" and "a" as the same, obviously. Distinct()将删除所有重复项,但是显然不会看到“ A”和“ a”相同。

Console.WriteLine(new string("Asdfasdf".Distinct().ToArray()));

Outputs "Asdfa" 输出“ Asdfa”

var test = "OMMMMMGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGMMM";

test.Distinct().Select(c => c.ToString()).ToList()
        .ForEach(c =>
            { 
                while (test.Contains(c + c)) 
                test = test.Replace(c + c, c); 
            }
        );

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM