[英]Removing Specified Punctuation From Strings
I have a String that in need to convert into a String[]
of each word in the string. 我有一个字符串,需要转换为String[]
中每个单词的String[]
。 However I do not need any white space or any punctuation EXCEPT hyphens and Apostrophes that belong in the word. 但是我不需要任何空格或任何标点符号,除了属于该单词的连字符和撇号。
Example Input: 示例输入:
Hello! This is a test and it's a short-er 1. - [ ] { } ___)
Example of the Array made from Input: 从Input生成的数组示例:
[ "Hello", "this", "is", "a", "test", "and", "it's", "a", "short-er", "1" ]
Currently this is the code I have tried 目前这是我尝试过的代码
(Note: the 2nd gives an error later in the program when string.First()
is called): (注意:当调用string.First()
时,第2个程序会在程序中出现错误):
private string[] ConvertWordsFromFile(String NewFileText)
{
char[] delimiterChars = { ' ', ',', '.', ':', '/', '|', '<', '>', '/', '@', '#', '$', '%', '^', '&', '*', '"', '(', ')', ';' };
string[] words = NewFileText.Split(delimiterChars, StringSplitOptions.RemoveEmptyEntries);
return words;
}
or 要么
private string[] ConvertWordsFromFile(String NewFileText)
{
return Regex.Split(NewFileText, @"\W+");
}
The second example crashes with the following code 第二个示例使用以下代码崩溃
private string GroupWordsByFirstLetter(List<String> words)
{
var groups =
from w in words
group w by w.First();
return FormatGroupsByAlphabet(groups);
}
specifically, when w.First()
is called. 特别是,当w.First()
时。
To remove unwanted characters from a String 从String中删除不需要的字符
string randomString = "thi$ is h@ving s*me inva!id ch@rs";
string excpList ="$@*!";
LINQ Option 1 LINQ选项1
var chRemoved = randomString
.Select(ch => excpList.Contains(ch) ? (char?)null : ch);
var Result = string.Concat(chRemoved.ToArray());
LINQ Option 2 LINQ选项2
var Result = randomString.Split().Select(x => x.Except(excList.ToArray()))
.Select(c => new string(c.ToArray()))
.ToArray();
Here is a little something I worked up. 这是我工作的一些东西。 Splits on \\n
and removes any unwanted characters. 在\\n
上拆分并删除任何不需要的字符。
private string ValidChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ123456789'-";
private IEnumerable<string> SplitRemoveInvalid(string input)
{
string tmp = "";
foreach(char c in input)
{
if(c == '\n')
{
if(!String.IsNullOrEmpty(tmp))
{
yield return tmp;
tmp = "";
}
continue;
}
if(ValidChars.Contains(c))
{
tmp += tmp;
}
}
if (!String.IsNullOrEmpty(tmp)) yield return tmp;
}
Usage could be something like this: 用法可能是这样的:
string[] array = SplitRemoveInvalid("Hello! This is a test and it's a short-er 1. - [ ] { } _)")
.ToArray();
I didnt actually test it, but it should work. 我没有实际测试它,但它应该工作。 If it doesnt, it should be easy enough to fix. 如果它没有,它应该很容易修复。
Use string.Split(char []) 使用string.Split(char [])
string strings = "4,6,8\n9,4";
string [] split = strings .Split(new Char [] {',' , '\n' });
OR 要么
Try below if you get any unwanted empty items. 如果您收到任何不需要的空白物品,请尝试以 String.Split Method (String[], StringSplitOptions) String.Split方法(String [],StringSplitOptions)
string [] split = strings .Split(new Char [] {',' , '\n' },
StringSplitOptions.RemoveEmptyEntries);
This can be done quite easily with a RegEx, by matching words. 使用RegEx可以很容易地通过匹配单词来完成。 I am using the following RegEx, which will allow hyphens and apostrophes in the middle of words, but will strip them out if they occur at a word boundary. 我正在使用以下RegEx,它将允许在单词中间使用连字符和撇号,但如果它们出现在单词边界,则会将它们删除。
\w(?:[\w'-]*\w)?
See it in action here . 在这里看到它。
In C# it could look like this: 在C#中它可能看起来像这样:
private string[] ConvertWordsFromFile(String NewFileText)
{
return (from m in new Regex(@"\w(?:[\w'-]*\w)?").Matches(NewFileText)
select m.Value).ToArray();
}
I am using LINQ to get an array of words from the MatchCollection
returned by Matches
. 我使用LINQ将文字从一个数组MatchCollection
通过返回Matches
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.