簡體   English   中英

在沒有 NLTK 的情況下刪除文本文件中的停用詞

[英]Remove stop words in text file without NLTK

我有 2 個文件: stopwords.txta.txt
我想刪除文件從停用詞stopwords.txt的文件a.txt ,並用空格隔開。

我該怎么做? 這就是我試圖做的:

def remove_stopwords(review_words):
with open('stopwords.txt') as stopfile:
    stopwords = stopfile.read()
    list = stopwords.split()
    print(list)
    with open('a.txt') as workfile:
        read_data = workfile.read()
        data = read_data.split()
        print(data)
        for word1 in list:
            for word2 in data:
                if word1 == word2:
                    return data.remove(list)
                    print(remove_Stopwords)

提前致謝

下面是一個例子:

k = []
z = []
with open('stopWords.txt', 'r') as f:
   for word in f:
        word = word.split('\n')
        k.append(word[0])

with open('a.txt', 'r') as f_obj:
    for u in f_obj:
        u = u.split('\n')
        z.append(u[0])

p = [t for t in z if t not in k]
print(p)

遍歷停用詞文件中的每個單詞並將其附加到列表中,然后遍歷另一個文件中的每個單詞。 執行列表理解並刪除出現在停用詞列表中的每個單詞。

a.txt :

good great bad

stopwords.txt

good bad

也許:

with open('a.txt','r') as f, open('stopwords.txt','r') as f2:
   a=f.read().split();b=f2.read().split()
   print(' '.join(i for i in a if i.lower() not in (x.lower() for x in b)))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM