简体   繁体   English

如何解析大写单词的字符串

[英]how to parse string for capitalized words

I have this string: " Mimi loves Toto and Tata hate Mimi so Toto killed Tata" 我有以下字符串: " Mimi loves Toto and Tata hate Mimi so Toto killed Tata"

I want to write a code that print only the words that begin with capital letters, avoiding repetition 我想编写一个仅打印以大写字母开头的单词的代码,以避免重复

the Output should be like 输出应该像

Mimi
Toto
Tata

I tried to do so but I'm sure its wrong even though no errors are showing. 我尝试这样做,但是即使没有错误显示,我也确信它是错误的。

The code i wrote : 我写的代码:

static void Main(string[] args)
        {
            string s = "Memi ate Toto and she killed Tata Memi also hate Biso";
            Console.WriteLine((spliter(s)));
        }



        public static string spliter(string s)
        {

            string x = s;
            Regex exp = new Regex(@"[A-Z]");
            MatchCollection M = exp.Matches(s);

            foreach (Match t in M)
            {

                while (x != null)
                {
                    x = t.Value;  
                }

            }
            return x;
        }


    }
}

Idea: 理念:

What if i split the string into an array, then apply a regex to check them word by word and then print the results ? 如果我将字符串分割成数组,然后应用正则表达式逐字检查它们,然后打印结果怎么办? I don't know - can any one help me in making this code work? 我不知道-任何人都可以帮助我使此代码正常工作吗?

I don't know the C#/.net regex lib at all, but this this regex pattern will do it: 我根本不了解C#/。net正则表达式库,但是这个正则表达式模式可以做到:

\b[A-Z][a-z]+

the \\b means the match can only start at the beginning of a word. \\ b表示匹配只能从单词的开头开始。 change + to * if you want to allow single-word capitals. 如果要允许单字大写,请将+更改为*。

Edit: You want to match "McDonald's"? 编辑:您想匹配“麦当劳”吗?

\b[A-Z][A-Za-z']+

If you don't want to match ' if it only appears at the end of a string, then just do this: 如果您不想匹配'(如果它仅出现在字符串的末尾),则只需执行以下操作:

\b[A-Z][A-Za-z']+(?<!')

I'm not sure why I'm posting this... 我不确定为什么要发布这个...

   string[] foo = "Mimi loves Toto and Tata hate Mimi so Toto killed Tata".Split(' ');
            HashSet<string> words = new HashSet<string>();
            foreach (string word in foo)
            {
                if (char.IsUpper(word[0]))
                {
                    words.Add(word);
                }
            }

            foreach (string word in words)
            {
                Console.WriteLine(word);
            }

C# 3 C#3

        string z = "Mimi loves Toto and Tata hate Mimi so Toto killed Tata";
        var wordsWithCapital = z.Split(' ').Where(word => char.IsUpper(word[0])).Distinct();
        MessageBox.Show( string.Join(", ", wordsWithCapital.ToArray()) );

C# 2 C#2

        Dictionary<string,int> distinctWords = new Dictionary<string,int>();
        string[] wordsWithInitCaps = z.Split(' ');
        foreach (string wordX in wordsWithInitCaps)
            if (char.IsUpper(wordX[0]))
                if (!distinctWords.ContainsKey(wordX))
                    distinctWords[wordX] = 1;
                else
                    ++distinctWords[wordX];                       


        foreach(string k in distinctWords.Keys)
            MessageBox.Show(k + ": " + distinctWords[k].ToString());

I'd suggest do a string.split to seperate the string into words, and then just print words where char.IsUpper(word[0]) is true. 我建议做一个string.split将字符串分成单词,然后仅在char.IsUpper(word [0])为true的情况下打印单词。

Something like this 这样

use this regex 使用此正则表达式

([AZ][az]+) ([AZ] [az] +)

explanation: 说明:

[A-Z]    [a-z]+
  |        |
Single   Multiple(+)
  |        |
  C      apital   -> Capital

Try out regex here 在这里尝试正则表达式

Solution. 解。 Notice use of built in string splitter. 注意使用内置的字符串拆分器。 You could replace the toupper stuff by checking if the first character is between 'A' and 'Z'. 您可以通过检查第一个字符是否在'A'和'Z'之间来替换上面的东西。 Removing duplicates I leave to you (use a hashset if you want). 删除我留给您的重复项(如果需要,请使用哈希集)。

static void Main(string[] args)
    {
        string test = " Mimi loves Toto and Tata hate Mimi so Toto killed Tata";
        foreach (string j in test.Split(' '))
        {
            if (j.Length > 0)
            {
                if (j.ToUpper()[0] == j[0])
                {
                    Console.WriteLine(j);
                }
            }
        }
        Console.ReadKey(); //Press any key to continue;
    }

Since others have already posted so much of the answer, I don't feel I'm breaking any homework rules to show this: 由于其他人已经发布了这么多答案,因此我认为我没有违反任何作业规则来显示此内容:

//set up the string to be searched
string source =
"First The The Quick Red fox jumped oveR A Red Lazy BRown DOg";

//new up a Regex object.
Regex myReg = new Regex(@"(\b[A-Z]\w*)");

//Get the matches, turn then into strings, de-dupe them
IEnumerable<string> results =
    myReg.Matches(source)
    .OfType<Match>()
    .Select(m => m.Value)
    .Distinct();

//print out the strings.
foreach (string s in results)
    Console.WriteLine(s);
  • For learning the Regex type, you should start here . 要学习正则表达式类型,您应该从此处开始。
  • For learning the Linq in-memory query methods, you should start here . 要学习Linq内存中查询方法,您应该从此处开始。

Appropriate regex: \\b\\p{Lu}\\p{L}* 适当的正则表达式: \\b\\p{Lu}\\p{L}*

var result = 
    Regex.Matches(input, @"\b\p{Lu}\p{L}*")
    .Cast<Match>().Select(m => m.Value);
string foo = "Mimi loves Toto and Tata hate Mimi so Toto killed Tata";
char[] separators = {' '};
IList<string> capitalizedWords = new List<string>();
string[] words = foo.Split(separators);
foreach (string word in words)
{
    char c = char.Parse(word.Substring(0, 1));

    if (char.IsUpper(c))
    {
        capitalizedWords.Add(word);
    }
}

foreach (string s in capitalizedWords)
{
    Console.WriteLine(s);
}

David B's answer is the best one, he takes into account the word stopper. David B的答案是最好的答案,他考虑到了塞子这个词。 One vote up. 一票。

To add something to his answer: 在他的答案中添加一些内容:

        Func<string,bool,string> CaptureCaps = (source,caseInsensitive) => string.Join(" ", 
                new Regex(@"\b[A-Z]\w*").Matches(source).OfType<Match>().Select(match => match.Value).Distinct(new KeisInsensitiveComparer(caseInsensitive) ).ToArray() );


        MessageBox.Show(CaptureCaps("First The The  Quick Red fox jumped oveR A Red Lazy BRown DOg", false));
        MessageBox.Show(CaptureCaps("Mimi loves Toto. Tata hate Mimi, so Toto killed TaTa. A bad one!", false));


        MessageBox.Show(CaptureCaps("First The The  Quick Red fox jumped oveR A Red Lazy BRown DOg", true));
        MessageBox.Show(CaptureCaps("Mimi loves Toto. Tata hate Mimi, so Toto killed TaTa. A bad one!", true));


class KeisInsensitiveComparer : IEqualityComparer<string>
{
    public KeisInsensitiveComparer() { }

    bool _caseInsensitive;
    public KeisInsensitiveComparer(bool caseInsensitive) { _caseInsensitive = caseInsensitive; }


    // Products are equal if their names and product numbers are equal.
    public bool Equals(string x, string y)
    {

        // Check whether the compared objects reference the same data.
        if (Object.ReferenceEquals(x, y)) return true;

        // Check whether any of the compared objects is null.
        if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
            return false;



        return _caseInsensitive ? x.ToUpper() == y.ToUpper() : x == y;
    }

    // If Equals() returns true for a pair of objects,
    // GetHashCode must return the same value for these objects.

    public int GetHashCode(string s)
    {
        // Check whether the object is null.
        if (Object.ReferenceEquals(s, null)) return 0;

        // Get the hash code for the Name field if it is not null.
        int hashS = s == null ? 0 : _caseInsensitive ? s.ToUpper().GetHashCode() : s.GetHashCode();

        // Get the hash code for the Code field.
        int hashScode = _caseInsensitive ? s.ToUpper().GetHashCode() : s.GetHashCode();

        // Calculate the hash code for the product.
        return hashS ^ hashScode;
    }

}
    static Regex _capitalizedWordPattern = new Regex(@"\b[A-Z][a-z]*\b", RegexOptions.Compiled | RegexOptions.Multiline);

    public static IEnumerable<string> GetDistinctOnlyCapitalizedWords(string text)
    {
        return _capitalizedWordPattern.Matches(text).Cast<Match>().Select(m => m.Value).Distinct();
    }
function capitalLetters() {
  var textAreaId = "textAreaId";
  var resultsArray = $(textAreaId).value.match( /\b[A-Z][A-Za-z']+/g );
  displayResults(textAreaId, resultsArray);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM