[英]Removing words from text with separators in front(using Regex)
I need to remove words from the text with separators next to them. 我需要从文本旁边删除带有分隔符的单词。 I already removed words but I don't know how I can remove separators at the same time. 我已经删除了单词,但是不知道如何同时删除分隔符。 Any suggestions? 有什么建议么?
At the moment I have: 目前,我有:
static void Main(string[] args)
{
Program p = new Program();
string text = "";
text = p.ReadText("Duomenys.txt", text);
string[] wordsToDelete = { "Hello", "Thanks", "kinda" };
char[] separators = { ' ', '.', ',', '!', '?', ':', ';', '(', ')', '\t' };
p.DeleteWordsFromText(text, wordsToDelete, separators);
}
public string ReadText(string file, string text)
{
text = File.ReadAllText(file);
return text;
}
public void DeleteWordsFromText(string text, string[] wordsToDelete, char[] separators)
{
Console.WriteLine(text);
for (int i = 0; i < wordsToDelete.Length; i++)
{
text = Regex.Replace(text, wordsToDelete[i], String.Empty);
}
Console.WriteLine("-------------------------------------------");
Console.WriteLine(text);
}
The results should be: 结果应为:
how are you?
I am good.
I have: 我有:
, how are you?
, I am . good.
Duomenys.txt Duomenys.txt
Hello, how are you?
Thanks, I am kinda. good.
You can build the regex like follows: 您可以按照以下方式构建正则表达式:
var regex = new Regex(@"\b("
+ string.Join("|", wordsToDelete.Select(Regex.Escape)) + ")("
+ string.Join("|", separators.Select(c => Regex.Escape(new string(c, 1)))) + ")?");
Explanation: 说明:
You may build a regex like 您可以构建一个正则表达式
\b(?:Hello|Thanks|kinda)\b[ .,!?:;() ]*
where \\b(?:Hello|Thanks|kinda)\\b
will match any words to delete as whole words and [ .,!?:;() ]*
will match all your separators 0 or more times after the words to delete. 其中\\b(?:Hello|Thanks|kinda)\\b
将与要删除的所有单词匹配为整个单词,而[ .,!?:;() ]*
将与要删除的单词之后的所有分隔符匹配0次或更多次。
The C# solution : C#解决方案 :
char[] separators = { ' ', '.', ',', '!', '?', ':', ';', '(', ')', '\t' };
string[] wordsToDelete = { "Hello", "Thanks", "kinda" };
string SepPattern = new String(separators).Replace(@"\", @"\\").Replace("^", @"\^").Replace("-", @"\-").Replace("]", @"\]");
var pattern = $@"\b(?:{string.Join("|", wordsToDelete.Select(Regex.Escape))})\b[{SepPattern}]*";
// => \b(?:Hello|Thanks|kinda)\b[ .,!?:;() ]*
Regex rx = new Regex(pattern, RegexOptions.Compiled);
// RegexOptions.IgnoreCase can be added to the above flags for case insensitive matching: RegexOptions.IgnoreCase | RegexOptions.Compiled
DeleteWordsFromText("Hello, how are you?", rx);
DeleteWordsFromText("Thanks, I am kinda. good.", rx);
Here is the DeleteWordsFromText
method: 这是DeleteWordsFromText
方法:
public static void DeleteWordsFromText(string text, Regex p)
{
Console.WriteLine($"---- {text} ----");
Console.WriteLine(p.Replace(text, ""));
}
Output: 输出:
---- Hello, how are you? ----
how are you?
---- Thanks, I am kinda. good. ----
I am good.
Notes : 注意事项 :
string SepPattern = new String(separators).Replace(@"\\", @"\\\\").Replace("^", @"\\^").Replace("-", @"\\-").Replace("]", @"\\]");
- it is a separator pattern that will be used inside a character class, and since only ^
, -
, \\
, ]
chars require escaping inside a character class, only these chars are escaped -这是一个分隔符模式,将在字符类内使用,并且由于仅^
, -
, \\
, ]
字符需要在字符类内转义,因此仅对这些字符进行转义 $@"\\b(?:{string.Join("|", wordsToDelete.Select(Regex.Escape))})\\b"
- this will build the alternation from the words to delete and will only match them as whole words. $@"\\b(?:{string.Join("|", wordsToDelete.Select(Regex.Escape))})\\b"
-这将构建要删除的单词的替代形式,并且仅将它们与整个单词匹配。 Pattern details 图案细节
\\b
- word boundary \\b
单词边界 (?:
- start of a non-capturing group: (?:
-非捕获组的开始:
Hello
- Hello
word Hello
- Hello
字 |
- or - 要么 Thanks
- Thanls
word Thanks
- Thanls
词 |
- or - 要么 kinda
- kinda
word kinda
- kinda
词 )
- end of the group )
-小组结束 \\b
- word boundary \\b
单词边界 [ .,!?:;() ]*
- any 0+ chars inside the character class. [ .,!?:;() ]*
-字符类中的任何0+个字符。 See the regex demo . 参见regex演示 。
I would not use Regex. 我不会使用正则表达式。 In 3 months from now, you'll not understand the Regex any more and fixing bugs is a hard thing then. 从现在开始的3个月内,您将不再对Regex有所了解,并且修复bug很难。
I would use simple loops. 我会使用简单的循环。 Everyone will understand: 每个人都会明白:
public void DeleteWordsFromText(string text, string[] wordsToDelete, char[] separators)
{
Console.WriteLine(text);
foreach (string word in wordsToDelete)
{
foreach(char separator in separators)
{
text = text.Replace(word + separator, String.Empty);
}
}
Console.WriteLine("-------------------------------------------");
Console.WriteLine(text);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.