I am having trouble creating code which removes stop words from a string input. Currently, here is my code:
stopWords = [ "a", "i", "it", "am", "at", "on", "in", "to", "too", "very", \
"of", "from", "here", "even", "the", "but", "and", "is", "my", \
"them", "then", "this", "that", "than", "though", "so", "are" ]
stemEndings = [ "-s", "-es", "-ed", "-er", "-ly" "-ing", "-'s", "-s'" ]
punctuation = [ ".", ",", ":", ";", "!", "?" ]
line = raw_input ("Type in lines, finish with a . at start of line only:")
while line != ".":
def remove_punctuation(input): #removes punctuation from input
output = ""
text= 0
while text<=(len(input)-1) :
if input[text] not in punctuation:
output=output + input[text]
text+=1
return output
newline= remove_punctuation(line)
newline= newline.lower()
What code could be added to remove stopWords from a string based on the stopWords list above? Thank you in advance.
As I undestand your problem, you whant to remove punctuation from an input string. My variant remove_punctuation
function:
def remove_punctuation(input_string):
for item in punctuation:
input_string = input_string.replace(item, '')
return input_string
As greg suggested, you should use a for
loop instead of a while
because it is more pythonic & easy to understand the code. Also, you should make your function declaration before the while
loop for input, so that the python interpreter does not re-define the function everytime!
Also, if you want, you can set punctuation to a string
rather than a list
(for readability & ease)
stopWords = [ "a", "i", "it", "am", "at", "on", "in", "to", "too", "very", \
"of", "from", "here", "even", "the", "but", "and", "is", "my", \
"them", "then", "this", "that", "than", "though", "so", "are" ]
stemEndings = [ "-s", "-es", "-ed", "-er", "-ly" "-ing", "-'s", "-s'" ]
punctuation = ".,:;!?"
def remove_punctuation(input_string):
for item in punctuation:
input_string = input_string.replace(item, '')
return input_string
line = raw_input ("Type in lines, finish with a . at start of line only:")
while not line == ".":
newline = remove_punctuation(line)
newline = newline.lower()
I find something interesting in another post that boost your code performance a lot. Try use set like it mentioned in below link. Faster way to remove stop words in Python
Credit goes to alko
您可以使用NTLK库而不是定义停止词。
pip install nltk
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.