简体   繁体   English

从字符串中删除指定的标点符号

[英]Removing Specified Punctuation From Strings

I have a String that in need to convert into a String[] of each word in the string. 我有一个字符串,需要转换为String[]中每个单词的String[] However I do not need any white space or any punctuation EXCEPT hyphens and Apostrophes that belong in the word. 但是我不需要任何空格或任何标点符号,除了属于该单词的连字符和撇号。

Example Input: 示例输入:

Hello! This is a test and it's a short-er 1. - [ ] { } ___)

Example of the Array made from Input: 从Input生成的数组示例:

[ "Hello", "this", "is", "a", "test", "and", "it's", "a", "short-er", "1" ]

Currently this is the code I have tried 目前这是我尝试过的代码

(Note: the 2nd gives an error later in the program when string.First() is called): (注意:当调用string.First()时,第2个程序会在程序中出现错误):

private string[] ConvertWordsFromFile(String NewFileText)
{
     char[] delimiterChars = { ' ', ',', '.', ':', '/', '|', '<', '>', '/', '@', '#', '$', '%', '^', '&', '*', '"', '(', ')', ';' };
     string[] words = NewFileText.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
     return words;
}

or 要么

private string[] ConvertWordsFromFile(String NewFileText)
{     
    return Regex.Split(NewFileText, @"\W+");
}

The second example crashes with the following code 第二个示例使用以下代码崩溃

private string GroupWordsByFirstLetter(List<String> words)
{
    var groups =
        from w in words
        group w by w.First();
    return FormatGroupsByAlphabet(groups);
}

specifically, when w.First() is called. 特别是,当w.First()时。

To remove unwanted characters from a String 从String中删除不需要的字符

string randomString = "thi$ is h@ving s*me inva!id ch@rs";
string excpList ="$@*!";

LINQ Option 1 LINQ选项1

var chRemoved = randomString
                  .Select(ch => excpList.Contains(ch) ? (char?)null : ch);

var Result = string.Concat(chRemoved.ToArray());    

LINQ Option 2 LINQ选项2

var Result = randomString.Split().Select(x => x.Except(excList.ToArray()))
                                      .Select(c => new string(c.ToArray()))
                                      .ToArray();

Here is a little something I worked up. 这是我工作的一些东西。 Splits on \\n and removes any unwanted characters. \\n上拆分并删除任何不需要的字符。

    private string ValidChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ123456789'-";
    private IEnumerable<string> SplitRemoveInvalid(string input)
    {
        string tmp = "";
        foreach(char c in input)
        {
            if(c == '\n')
            {
                if(!String.IsNullOrEmpty(tmp))
                {
                    yield return tmp;
                    tmp = "";
                }
                continue;
            }
            if(ValidChars.Contains(c))
            {
                tmp += tmp;
            }
        }
        if (!String.IsNullOrEmpty(tmp)) yield return tmp;
    }

Usage could be something like this: 用法可能是这样的:

    string[] array = SplitRemoveInvalid("Hello! This is a test and it's a short-er 1. - [ ] { } _)")
                     .ToArray();

I didnt actually test it, but it should work. 我没有实际测试它,但它应该工作。 If it doesnt, it should be easy enough to fix. 如果它没有,它应该很容易修复。

Use string.Split(char []) 使用string.Split(char [])

string strings = "4,6,8\n9,4";
string [] split = strings .Split(new Char [] {',' , '\n' });

OR 要么

Try below if you get any unwanted empty items. 如果您收到任何不需要的空白物品,请尝试以 String.Split Method (String[], StringSplitOptions) String.Split方法(String [],StringSplitOptions)

string [] split = strings .Split(new Char [] {',' , '\n' }, 
                                 StringSplitOptions.RemoveEmptyEntries);

This can be done quite easily with a RegEx, by matching words. 使用RegEx可以很容易地通过匹配单词来完成。 I am using the following RegEx, which will allow hyphens and apostrophes in the middle of words, but will strip them out if they occur at a word boundary. 我正在使用以下RegEx,它将允许在单词中间使用连字符和撇号,但如果它们出现在单词边界,则会将它们删除。

\w(?:[\w'-]*\w)?

See it in action here . 这里看到它。

In C# it could look like this: 在C#中它可能看起来像这样:

private string[] ConvertWordsFromFile(String NewFileText)
{
     return (from m in new Regex(@"\w(?:[\w'-]*\w)?").Matches(NewFileText)
             select m.Value).ToArray();
}

I am using LINQ to get an array of words from the MatchCollection returned by Matches . 我使用LINQ将文字从一个数组MatchCollection通过返回Matches

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM