简体   繁体   English

如何忽略标点符号C#

[英]How to ignore the punctuation c#

I want to ignore the punctuation.So, I'm trying to make a program that counts all the appearences of every word in my text but without taking in consideration the punctuation marks. 我想忽略标点符号,因此,我试图创建一个程序来计算文本中每个单词的所有出现次数,但不考虑标点符号。 So my program is: 所以我的程序是:

 static void Main(string[] args)
    {
        string text = "This my world. World, world,THIS WORLD ! Is this - the world .";
        IDictionary<string, int> wordsCount =
         new SortedDictionary<string, int>();
        text=text.ToLower();
        text = text.replaceAll("[^0-9a-zA-Z\text]", "X");
        string[] words = text.Split(' ',',','-','!','.');
        foreach (string word in words)
        {
            int count = 1;
            if (wordsCount.ContainsKey(word))
                count = wordsCount[word] + 1;
            wordsCount[word] = count;
        }

        var items = from pair in wordsCount
                    orderby pair.Value ascending
                    select pair;

        foreach (var p in items)
        {
            Console.WriteLine("{0} -> {1}", p.Key, p.Value);
        }

    }

The output is: 输出为:

is->1
my->1
the->1
this->3
world->5
(here is nothing) -> 8

How can I remove the punctuation here? 如何在此处删除标点符号?

   string[] words = text.Split(new char[]{' ',',','-','!','.'}, StringSplitOPtions.RemoveEmptyItems);

You should try specifying StringSplitOptions.RemoveEmptyEntries : 您应该尝试指定StringSplitOptions.RemoveEmptyEntries

string[] words = text.Split(" ,-!.".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);

Note that instead of manually creating a char[] with all the punctuation characters, you may create a string and call ToCharArray() to get the array of characters. 请注意,您可以创建一个string并调用ToCharArray()来获取字符数组,而不是手动创建带有所有标点符号的char[]

I find it easier to read and to modify later on. 我发现以后更容易阅读和修改。

这很简单-第一步是使用Replace功能删除不需要的标点符号,然后根据需要继续进行拆分。

... you can go with the making people cry version ... ...您可以选择让人哭泣的版本...

"This my world. World, world,THIS WORLD ! Is this - the world ."
    .ToLower()
    .Split(" ,-!.".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
    .GroupBy(i => i)
    .Select(i=>new{Word=i.Key, Count = i.Count()})
    .OrderBy(k => k.Count)
    .ToList()
    .ForEach(Console.WriteLine);

.. output ..输出

{ Word = my, Count = 1 }
{ Word = is, Count = 1 }
{ Word = the, Count = 1 }
{ Word = this, Count = 3 }
{ Word = world, Count = 5 }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM