简体   繁体   中英

Removing Rows from CSV file with conditional argument Python

I am trying to remove rows with Python in a CSV file from a specific column with conditional argument. from example, deleting all the rows from that column that are between 3 different intervals. for example 99 and 1000, 5000 and 6000, 8000 and 9000.

You can open the file, filter the rows out and and write results to a new csv.

with open('data.csv', 'rb') as inp, open('output.csv', 'wb') as out:
interval = [range(99, 1001), range(5000, 6001), range(8000, 9001)]
writer = csv.writer(out)
for row in csv.reader(inp):
     if row["column_name"] not in interval:
         writer.writerow(row)
import subprocess

cmd = "sed -e '5000,6000d;99,1000d' file.csv"
subprocess.call(cmd, shell=True)

You can generate your intervals using range , and checking if an item is in a range is quite quick:

import csv

# add 1 to the max of the range for an inclusive search
intervals = [range(99, 1001), range(5000, 6001), range(8000, 9001)]

with open('data.csv') as fh, with open('out.csv', 'w') as output:
    writer = csv.writer(output)
    reader = csv.reader(fh)
    for row in reader:
        # do this to avoid the constant lookup of row['column_name']
        # in the interval check
        col = row['column_name']

        # it's fast to check if an integer is in a range
        # use all to check for each range
        if all(col not in i for i in intervals):
            writer.writerow(row)

To show the difference between checking in a list :

range check

python -m timeit -s 'x = 10000; y = range(10002)' 'x in y'
10000000 loops, best of 3: 0.0821 usec per loop

list check

python -m timeit -s 'x = 10000; y = list(range(10002))' 'x in y'
10000 loops, best of 3: 94.4 usec per loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM