I have a csv file where I need to delete the second and the third row and 3rd to 18th column. I was able to do get it to work in two steps, which produced an interim file. I am thinking that there must be a better and more compact way to do this. Any suggestions would be really appreciated.
Also, if I want to remove multiple ranges of columns, how do I specify in this code. For example, if I want to remove columns 25 to 29, in addition to columns 3 to 18 already specified, how would I add to the code? Thanks
remove_from = 2
remove_to = 17
with open('file_a.csv', 'rb') as infile, open('interim.csv', 'wb') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for row in reader:
del row[remove_from : remove_to]
writer.writerow(row)
with open('interim.csv', 'rb') as infile, open('file_b.csv', 'wb') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
writer.writerow(next(reader))
reader.next()
reader.next()
for row in reader:
writer.writerow(row)
Here is a pandas approach:
import pandas as pd
# Create sample CSV-file (100x100)
df = pd.DataFrame(np.arange(10000).reshape(100,100))
df.to_csv('test.csv', index=False)
import pandas as pd
import numpy as np
# Read first row to determine size of columns
size = pd.read_csv('test.csv',nrows=0).shape[1]
#want to remove columns 25 to 29, in addition to columns 3 to 18 already specified,
# Ok so let's create an array with the length of dataframe deleting the ranges
ranges = np.r_[3:19,25:30]
ar = np.delete(np.arange(size),ranges)
# Now let's read the dataframe
# let us also skip rows 2 and 3
df = pd.read_csv('test.csv', skiprows=[2,3], usecols=ar)
# And output
dt.to_csv('output.csv', index=False)
And the proof:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.