I am trying to remove rows with Python in a CSV file from a specific column with conditional argument. from example, deleting all the rows from that column that are between 3 different intervals. for example 99 and 1000, 5000 and 6000, 8000 and 9000.
You can open the file, filter the rows out and and write results to a new csv.
with open('data.csv', 'rb') as inp, open('output.csv', 'wb') as out:
interval = [range(99, 1001), range(5000, 6001), range(8000, 9001)]
writer = csv.writer(out)
for row in csv.reader(inp):
if row["column_name"] not in interval:
writer.writerow(row)
import subprocess
cmd = "sed -e '5000,6000d;99,1000d' file.csv"
subprocess.call(cmd, shell=True)
You can generate your intervals using range
, and checking if an item is in
a range
is quite quick:
import csv
# add 1 to the max of the range for an inclusive search
intervals = [range(99, 1001), range(5000, 6001), range(8000, 9001)]
with open('data.csv') as fh, with open('out.csv', 'w') as output:
writer = csv.writer(output)
reader = csv.reader(fh)
for row in reader:
# do this to avoid the constant lookup of row['column_name']
# in the interval check
col = row['column_name']
# it's fast to check if an integer is in a range
# use all to check for each range
if all(col not in i for i in intervals):
writer.writerow(row)
To show the difference between checking in a list
:
python -m timeit -s 'x = 10000; y = range(10002)' 'x in y'
10000000 loops, best of 3: 0.0821 usec per loop
python -m timeit -s 'x = 10000; y = list(range(10002))' 'x in y'
10000 loops, best of 3: 94.4 usec per loop
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.