[英]how to remove noise words from a string and search it using RegEx? C#
I am trying to perform a search for a string within a string. 我正在尝试在字符串中搜索字符串。
StringToSearch: The quick brown fox jumped over the fence
StringToSearch:
The quick brown fox jumped over the fence
searchTerm: brown jumped
searchTerm:
brown jumped
so when i do a StringToSearch.ContainsEx(searchTerm)
it returns true. 因此,当我执行
StringToSearch.ContainsEx(searchTerm)
它返回true。 So the way I have it working now is, I first remove nosie words using string.Remove()
then do a string.Split(' ')
to get the words and then perform a contains search on all words from this array in the text to be searched. 因此,我现在的工作方式是,我首先使用
string.Remove()
删除string.Remove()
单词,然后执行string.Split(' ')
以获取单词,然后对文本中此数组中的所有单词执行包含搜索进行搜索。
It works but I want it to make as performant as I can, so can I make use of RegEx to do the same kind of search? 它可以工作,但是我希望它尽可能地表现出色,所以我可以利用RegEx进行相同的搜索吗? ie 1) Remove noise words like
the
, of
etc and then see if all words in the searchString
are contained within the text to be searched? 即1)消除噪声的话像
the
, of
等,然后看是否在所有单词searchString
包含文本中要搜索?
I have no idea on uisng RegEx's in C# at all so code sample would be helpful. 我完全不知道在C#中使用RegEx的用法,因此代码示例会有所帮助。 Thank you and please suggest any other techniques if you feel that they will serve me better than Regular expressions.
谢谢,如果您觉得其他技术比正则表达式更适合我,请提出其他建议。
Try this(If you need, add more words like similar fashion): 试试这个(如果需要,添加更多类似方式的单词):
string sPattern = "(?=.*\bbrown\b)(?=.*\bjumped\b)"
if (System.Text.RegularExpressions.Regex.IsMatch(mainString, sPattern))
{
// do something
}
(?=.*\\bbrown\\b)
= Using positive lookahead it is checking if the word brown
exists in the text. (?=.*\\bbrown\\b)
=通过正向查找,它正在检查文本中是否存在brown
一词。 \\b
is word boundary, so that it doesn't pick the word from another. \\b
是单词边界,因此它不会从另一个单词中选取单词。 For example avoiding and
from the word land
例如避免
and
从land
一词
Try using Linq
, I think it will be good if both of your strings are long. 尝试使用
Linq
,如果两个字符串都长,我认为这会很好。 Using regex you first have to contruct a regex dynamically (for each element of searchTerm) and you would end up with a long regex, that might be slow. 使用正则表达式,您首先必须动态地构造一个正则表达式(针对searchTerm的每个元素),最终会得到一个长的正则表达式,这可能很慢。
List<string> StringToSearchList = new List<string>(StringToSearch.Split(' '));
List<string> searchTermList = new List<string>(searchTerm.Split(' '));
var query = StringToSearchList.Select(c => c).Except(searchTermList);
You can use string.Join
to convert array
to a string
. 您可以使用
string.Join
将array
转换为string
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.