简体   繁体   中英

Missing out rows with blank spaces when writing to a new CSV file

I'm attempting to write a program that enters a directory full of CSV files (all with the same layout but different data), reads the files, and writes all the data in the specific columns to a new CSV file. I would also like it to miss out the entire row of data is there is a blank space in one of the columns (in this case, if there is a gap in the Name column).

The program works fine in writing in specific columns (in this case Name and Location) from the old CSV files to the new one, however, I am unsure as to how I would miss out a line if there was a blank space.

import nltk
import csv
from nltk.corpus import PlaintextCorpusReader

root = '/Users/bennaylor/local documents/humanotics'
incorpus = root + '/chats/input/'
outcorpus =root + '/chats/output.csv'
doccorpus = PlaintextCorpusReader(incorpus, '.*\.csv')
filelist = doccorpus.fileids()

with open(outcorpus, 'w', newline='') as fw:
    fieldnames = (['Name','Location'])
    writer = csv.DictWriter(fw, fieldnames=fieldnames)
    writer.writeheader()
    print('Writing Headers!!')

    for rawdoc in filelist:
        infile = incorpus + rawdoc

        with open(infile, encoding='utf-8') as fr:
            reader = csv.DictReader(fr)

            for row in reader:
                rowName = row['Name']
                rowLocation = row['Location']

                writer.writerow({'Name': rowName, 'Location': rowLocation})

An example CSV input file would look like this:

Name,Age,Location,Birth Month

Steve,14,London,November

,18,Sydney,April

Matt,12,New York,June

Jeff,20,,December

Jonty,19,Greenland,July

With gaps in the Name column on the third row, and Location column on the fifth. In this case, I would like the program to miss out the third row when writing the data to a new CSV as there is a gap in the Name column

Thanks in advance for any help.

This is easy to do using pandas:

import pandas as pd
import os

# Create an empty data frame
df = pd.DataFrame()

# Add the data from all the files into the data frame
for filename in filelist:
    data = pd.read_csv(os.path.join(incorpus, filename))
    df = df.append(data)

# Drop rows with any empty values
df = df.dropna()

# Keep only the needed columns
df = df.reindex(columns=['Name', 'Location'])

# Write the dataframe to a .csv file
df.to_csv(outcorpus)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM