如何逐行讀取CSV文件並將其存儲到新行的新CSV文件中？

Question

我是Python的新手。 我正在嘗試讀取CSV文件，並且從文件中刪除停用詞后，必須將其存儲到新的CSV文件中。 我的代碼刪除了停用詞，但是它將第一行復制到文件的每一行中。 （例如，如果文件中有三行，那么它將在第一行中復制第一行三遍）。

正如我分析過的那樣，我認為問題出在循環中，但我沒有得到。 我的代碼附在下面。

碼：

import nltk
import csv
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

def stop_Words(fileName,fileName_out):
    file_out=open(fileName_out,'w')
    with open(fileName,'r') as myfile:
         line=myfile.readline()
         stop_words=set(stopwords.words("english"))
         words=word_tokenize(line)
         filtered_sentence=[" "]
         for w in myfile:
            for n in words:
               if n not in stop_words:
                 filtered_sentence.append(' '+n)
         file_out.writelines(filtered_sentence)
   print "All Done SW"

stop_Words("A_Nehra_updated.csv","A_Nehra_final.csv")
print "all done :)"

Answer 1

您只讀取文件的第一行： line=myfile.readline() 。 您要遍歷文件中的每一行。 一種方法是

with open(fileName,'r') as myfile:
    for line in myfile:
        # the rest of your code here, i.e.:
        stop_words=set(stopwords.words("english"))
        words=word_tokenize(line)

另外，你有這個循環

for w in myfile:
    for n in words:
        if n not in stop_words:
            filtered_sentence.append(' '+n)

但是您會注意到，最外層循環中定義的w從未在循環內部使用。 您應該可以刪除它，然后寫

for n in words:
    if n not in stop_words:
        filtered_sentence.append(' '+n)

編輯：

import nltk
import csv
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

def stop_Words(fileName,fileName_out):
    file_out=open(fileName_out,'w')
    with open(fileName,'r') as myfile:
        for line in myfile:
            stop_words=set(stopwords.words("english"))
            words=word_tokenize(line)
            filtered_sentence=[""]
            for n in words:
                if n not in stop_words:
                    filtered_sentence.append(""+n)
            file_out.writelines(filtered_sentence+["\n"])
    print "All Done SW"

如何逐行讀取CSV文件並將其存儲到新行的新CSV文件中？

問題描述

1 個解決方案

解決方案1
2 已采納 2016-06-08 16:04:20

如何逐行讀取CSV文件並將其存儲到新行的新CSV文件中？

問題描述

1 個解決方案

解決方案1 2 已采納 2016-06-08 16:04:20

解決方案1
2 已采納 2016-06-08 16:04:20