简体   繁体   中英

Python: How to create a list of specific columns in the csv file then replace some data?

I have a csv file named 2001.csv which looks like:

Year Month Day Departure Destination Airline
2000 05    21    SFO        BWI       NE100 
2001 06    18    LAX        CLE       XC102
2001 07    24    ATL        LAX       SF303
2001 07    11    JFK        ICN       FN102
The data has 150 lines like this.

I have to write a function that will make that csv to look like a list of columns selected (in this case: 0,1,2). Moreover, I only need to extract head 100 data, replacing "NA" to 0.

def process(flights):
    """
    """
   processed = []

    # read from original data
    with open('2001.csv', 'r') as f:
        reader = csv.reader(f, delimiter = '')
        cols = [0,1,2]

        # select column numbers
        for row in reader:
            flights = list(row[i] for i in cols)

        for index, flight_data in enumerate(flights):
            if flights == 'NA':
                flights[index] = 0 

        # extract 100 data

        processed = flight[0][:100]

    print(processed)

    return processed

Result that I am looking for is so that

len(newflight) = 100

Year   Month   Day 
2000     05    21
2001     06    18
2001     07    24
2001     07    11

This will the new csv, but it should be in list not in csv. Like ['Year','Month','Day'] But I am looking for 100 data excluding headers.

This method return the first 3 columns of the first 100 rows of the given csv file. It also replaces 'NA' with 0 :

import csv

def process_flights_csv(path_to_flights_csv):
    flights = []

    with open(path_to_flights_csv) as f_in:
        reader = csv.reader(f_in, delimiter = ' ')
        # skip first row
        header = reader.next()
        for i, row in enumerate(reader):
            # stop after reading 100 rows
            if i == 100:
                break

            # replace 'NA' with 0
            row = [col if col != 'NA' else 0 for col in row[:3]]

            flights.append(row)
    return flights

in your code:

for row in reader:
    flights = list(row[i] for i in cols)

flights will only contain the data of last row in your data file and the value of flights will not be 'NA',so the value of:

flights == 'NA'

will always be False

I've corrected your code below:

import csv

def process():
    """
    """

    flights = []

    # read from original data
    with open('2001.csv', 'r') as f:
        reader = csv.reader(f, delimiter = ' ')
        cols = [0,1,2]

        # select column numbers
        for num, row in enumerate(reader):
            flights.append([row[i] for i in cols])
            if num == 100: break

        for data in flights:
            for num, val in enumerate(data):
                if val == 'NA':data[num] = 0

        print(flights)
        return flights

if __name__ == "__main__":

    a = process()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM