简体   繁体   中英

remove stopword from a String in asp.net c#

I am having trouble creating code which removes stop words from a string. Here is my code:

String Review="The portfolio is fine except for the fact that the last movement of sonata #6 is missing. What should one expect?";

string[] arrStopword = new string[] {"a", "i", "it", "am", "at", "on", "in", "to", "too", "very","of", "from", "here", "even", "the", "but", "and", "is","my","them", "then", "this", "that", "than", "though", "so", "are"};
StringBuilder sbReview = new StringBuilder(Review);
foreach (string word in arrStopword){
sbReview.Replace(word, "");}
Label1.Text = sbReview.ToString();

when running Label1.Text = "The portfolo s fne except for fct tht lst movement st #6 s mssng. Wht should e expect? "

I expect it must return "portofolio fine except for fact last movement sonata #6 is missing. what should one expect?"

Anybody know how to solve this?

The problem is that you are comparing sub strings, not words. You need to split the original text, remove the items and then join it again.

try this

List<string> words = Review.Split(" ").ToList();
foreach(string stopWord in arrStopWord)
    words.Remove(stopWord);
string result = String.Join(" ", words);

The only issue that I can see with this is that it doesnt handle punctiation that well, but you get the general idea.

You can use LINQ to solve this problem. You first need to convert your string , using Split function, into list of string separated by " " (space), then use Except to get the words which your result will contain and then can apply string.Join

var newString = string.Join(" ", Review.Split(' ').Except(arrStopword));

You could use " a ", " I ", etc to make sure the program only removes those words if they're used as a word (so with spaces around them). Just replace them with a space to keep the formatting as it is.

Or You can use dotnet-stop-words package . And simply call the RemoveStopWords method

(yourString).RemoveStopWords("en");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM