简体   繁体   中英

Running each row of a CSV with a module in Python

Working in Python 3.8.5

Problem: CSV file containing 100 rows, two columns containing data that are needed for a module to perform a calculation. I would like to run each row with the two data points, take the output and insert it into a third column.

Action so far: I found the module CSV and can use CSV.reader to read each line. I can see how I would get the output of the data points but not how to take them and use them in the module I need to run to process the data. I also found subprocess which I believe is the module which will let me process each line. I'm just finding it difficult to connect both of these.

Example data:

DateTime,Date,Time,Wind_Direction,Effective_Fetch,Wind_Speed
01/10/2012 00:00,01/10/2012,00:00:00,228,510,1.976
01/10/2012 00:10,01/10/2012,00:10:00,231,516,1.389
01/10/2012 00:20,01/10/2012,00:20:00,239,532,1.759

The two columns I want to process are Effective_Fetch and Wind_Speed .

The module is as follows:

def Hs(w, Lf):
    gravity=9.81 #ms^-2
    slope=0.0026
    x = (slope)*(gravity**(-0.53))*(w**(1.06)*(Lf**(0.47)))
    return x

w is Wind_Speed , Lf is Effective_Fetch and x is the value that I would like to insert into a column following Wind_Speed with the column header "Wave_Height" - I've read other modules that should be able to do this too in Pandas.

You probably want something like this

output_rows = []
with open('mycsv.csv', newline='') as f:
    reader = csv.reader(f)
    # Skip the first (header) row
    headers = next(reader)
    for row in reader:
        # Unpack the two values we are interested in, ignore the others
        *_, effective_fetch, wind_speed = row
        # Values read from CSVs are strings, so cast them to numeric types
        result = Hs(float(wind_speed), int(effective_fetch))
        # Make a new row of the original row and the result of calling Hs
        output_rows.append(row + [result])

# Write out a new csv (if required)
with open('mynewcsv.csv', 'w', newline='') as new_f:
    writer = csv.writer(new_f)
    writer.writerow(headers + ['wave_height'])
    writer.writerows(output_rows)

The csv.reader object is an iterator, so using the next function advances it one step. This is less awkward than having a condition within the main for loop to check if we are processing the first row.

The Hs function requires two inputs, and luckily they are the final two columns in the csv rows.

*_, effective_fetch, wind_speed = row

tells the interpreter to assign the values of the last two columns to effective_fetch and wind_speed , and assign all the previous columns to a variable named _ , which is a common naming convention for a variable that we intend to ignore (you can call it whatever you like, of course).

You could also do this by row index, especially if the columns were less conveniently placed:

 effective_fetch, wind_speed = row[4], row[5]

or by indexing from the end:

 effective_fetch, wind_speed = row[-2], row[-1]

or by list slicing:

 effective_fetch, wind_speed = row[4:]
 effective_fetch, wind_speed = row[-2:]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM