简体   繁体   English

在C#中从字符串中删除空格后的单词或字符

[英]Removing word or character from string followed by space or before space in c#

I have a string 我有一个弦

string name = "AL QADEER UR AL REHMAN AL KHALIL UN";

How would I remove all characters AL , UR , UN or may be some more like that. 我将如何删除所有字符ALURUN或其他类似的字符。

My string should look like this; 我的弦应该像这样;

QADEER REHMAN KHALIL 加德勒·雷曼·哈利勒

Currently I am trying do like this; 目前,我正在尝试这样做。

List<string> list = new List<string> { "AL", "UR", "UN" };

foreach (var item in list )

{
    systemName = systemName.Replace(item, "");
}

This is also removing AL from KHALIL , how do I restrict this to not removing a word containg that characters. 这也从KHALIL删除了AL ,我如何将其限制为不删除包含该字符的单词。

Update: While adding spaces to words in List , will only remove words which has space before and after the word. 更新:在List单词中添加空格时,只会删除单词前后有空格的单词。 and concatenate UR to following word. 并将UR连接到以下单词。 I am loading List of words which are to be removed from database; 我正在加载要从数据库中删除的单词列表;

static void TestRegex()
{
    string name = "AL QADEER UR AL REHMAN AL KHALIL UN";
    // Add other strings you want to remove
    string pattern = @"\b(AL|UR|UN)\b";
    name = Regex.Replace(name, pattern, String.Empty);
    // Remove extra spaces
    name = Regex.Replace(name, @"\s{2,}", " ").Trim();
    Console.WriteLine(name);
}

UPDATE 更新

You can generate the pattern this way: 您可以通过以下方式生成模式:

// Generate pattern
var list = new List<string> { "AL", "UR", "UN" };
string pattern = @"\b(" + String.Join("|", list) + @")\b";

Try this please : 请尝试这个:

var name = "AL QADEER UR AL REHMAN AL KHALIL UN";
var list = new List<string> { "AL", "UR", "UN" };
name = string.Join(" ", name.Split(' ').ToList().Except(list));
var words = new[] { "AL", "UR", "UN" };
var arr = systemName.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries).Except(words);
systemName = string.Join(" ", arr);

No need to use regular expressions. 无需使用正则表达式。 Having defined list with "prohibited" words, it's enough to iterate over wprds in the sentence to filter, if word is in the list of prohibited words, then exclude it, otherwise, include the word in final string. 用“禁止”单词定义列表后,足以遍历句子中的wprds进行过滤,如果单词在禁止单词列表中,则将其排除,否则,将该单词包括在最终字符串中。

Try this: 尝试这个:

string name = "AL QADEER UR AL REHMAN AL KHALIL UN";
string systemName = "";
List<string> list = new List<string> { "AL", "UR", "UN" };

foreach (var item in name.Split(new char[] { ' ', ',', '.' }, StringSplitOptions.RemoveEmptyEntries))
    systemName += list.Contains(item) ? "" : item + " ";

I am loading that list from database, and can be change any time, how do use this in regex when changes occur 我正在从数据库加载该列表,并且可以随时更改,更改发生时如何在正则表达式中使用此列表

okay then, the length would always be 2? 好吧,长度总是2?

no, but not be greater than 4 不,但不大于4

public static void Main()
{
    var input = "AL QADEER UR AL REHMAN AL KHALIL UN AAA BBB";
    Regex re = new Regex(@"\b\w{1,4}\b");
    var result = re.Replace(input, "");
    Console.WriteLine(result);
}

OUTPUT: 输出:

QADEER REHMAN KHALIL

dotNetFiddle dotNetFiddle

Pure LINQ answer with the help of EXCEPT 借助EXCEPTLINQ答案

string name = "AL QADEER UR AL REHMAN AL KHALIL UN";
var list = new string[] { "AL", "UR", "UN" };

name = name
   .Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries)
   .Except(list)
   .Aggregate((prev, next) => $"{prev} {next}");

OUTPUT: QADEER REHMAN KHALIL 输出: QADEER REHMAN KHALIL

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM