简体   繁体   中英

How to read a CSV file line by line and store it to new CSV file on new row every time?

I am new to Python. I am trying to read a CSV file and after removing stopwords from the file I have to store it into new CSV file. My code is removing stop words but it copying the first row to the each row of the file in single row. (eg, if there are three rows in a file, then it will copy the first row three times in the first row).

As I have analyzed it I think the problem is in the loops but I'm not getting it. My code is attached below.

Code:

import nltk
import csv
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

def stop_Words(fileName,fileName_out):
    file_out=open(fileName_out,'w')
    with open(fileName,'r') as myfile:
         line=myfile.readline()
         stop_words=set(stopwords.words("english"))
         words=word_tokenize(line)
         filtered_sentence=[" "]
         for w in myfile:
            for n in words:
               if n not in stop_words:
                 filtered_sentence.append(' '+n)
         file_out.writelines(filtered_sentence)
   print "All Done SW"

stop_Words("A_Nehra_updated.csv","A_Nehra_final.csv")
print "all done :)"

You're only reading the first line of your file: line=myfile.readline() . You want to iterate over each line in your file. One way to do this is

with open(fileName,'r') as myfile:
    for line in myfile:
        # the rest of your code here, i.e.:
        stop_words=set(stopwords.words("english"))
        words=word_tokenize(line)

Also, you have this loop

for w in myfile:
    for n in words:
        if n not in stop_words:
            filtered_sentence.append(' '+n)

But you'll notice that the w defined in the outermost loop is never used inside the loop. You should be able to remove this and just write

for n in words:
    if n not in stop_words:
        filtered_sentence.append(' '+n)

edit:

import nltk
import csv
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

def stop_Words(fileName,fileName_out):
    file_out=open(fileName_out,'w')
    with open(fileName,'r') as myfile:
        for line in myfile:
            stop_words=set(stopwords.words("english"))
            words=word_tokenize(line)
            filtered_sentence=[""]
            for n in words:
                if n not in stop_words:
                    filtered_sentence.append(""+n)
            file_out.writelines(filtered_sentence+["\n"])
    print "All Done SW"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM