简体   繁体   中英

can any one will help me to remove stopwords from my mysql tweets database

for a java project i am collecting tweets from twitter, i have collected about 30,000 tweets so far and going to collect more, i want to remove stopwords from those tweets and filter them to another mirror datatbase by removing the stopwords, can anyone help me with this, thanks. If i download some list of stopwords, and check for each tweets it will take too much time, if there any other efficient way to do it, and also i did not found .txt of list of stopwords, please help me with this.

Make a list of stopwords, and read This Page from the mysql manual.

To override the default stopword list, set the ft_stopword_file system variable. (See Section 5.1.4, “Server System Variables”.) The variable value should be the path name of the file containing the stopword list, or the empty string to disable stopword filtering. The server looks for the file in the data directory unless an absolute path name is given to specify a different directory. After changing the value of this variable or the contents of the stopword file, restart the server and rebuild your FULLTEXT indexes.

Your 36 words you mentioned refer to those stop words baked into the mysql daemon upon it getting compiled, and may be representative of the topic described on This Page in the manual.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM