I have a csv with 12288+1 coluns, and want to reduct to 4096+1 colums.
In this 12288+1 colums, they are same values on each three and the last value is a bit, 0 or 1.
I need to maintain a last value, and take just 1 for repetitive group of three.
And my original csv have 300 rows, or lines, whatever. I don't know how to do for catch others rows, and my script just take a first row/line.
from original csv 3,3,3,5,5,5,7,7,7,10,10,10 ... 20,20,20,50,50,50,1
want final csv 3,5,7,10 ... 20,50,1
import csv
count, num = 0
a = ''
with open('data.csv','rb') as filecsv:
reader = csv.reader(filecsv)
for row in reader:
while count < 12290:
a = a + str(row[:][count])+','
count = count + 3
num = num + 1
print num
print a
This prints just to have a idea.
Thanks for any help
If you don't mind using a library, Pandas will be able to do this for you nicely.
You can read a csv with pandas.read_csv. The use_cols parameter specifies which columns you want to keep, so you can use that to ignore these repeated columns.
columns = list(range(1,12288,3))
columns.append(12288)
data = pandas.read_csv('data.csv', usecols=columns)
data.to_csv('new_data.csv')
If they are always groups of three, just throw 2 away.
Group into groups of 3 like so:
>>> row=range(9)
>>> [row[i:i+3] for i in range(0,len(row),3)]
[[0, 1, 2], [3, 4, 5], [6, 7, 8]]
However, this will give you groups of less than 3 at the end if row
is not a multiple of 3:
>>> row=range(11)
>>> [row[i:i+3] for i in range(0,len(row),3)]
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
^ ^ only two elements...
If the number of elements may be a non multiple of 3, use zip. It will drop incomplete r,g,b groups:
>>> row=range(11)
>>> zip(*[iter(row)]*3)
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]
Then unpack into r,g,b components:
import csv
with open('data.csv','rb') as filecsv:
reader = csv.reader(filecsv)
for row in reader:
for r, g, b in [row[i:i+3] for i in range(0,len(row),3)]:
# use r or g or b, ignore the other two
If you are getting a ValueError
you have a non multiple of 3 set of data (or csv is not parsing the data correctly) Try using zip
as stated:
import csv
with open('data.csv','rb') as filecsv:
reader = csv.reader(filecsv)
for row in reader:
for r, g, b in zip(*[iter(row)]*3):
# use r or g or b, ignore the other two
(not tested...)
To remove consecutive duplicates, you could use itertools.groupby
function :
#!/usr/bin/env python
import csv
from itertools import groupby
from operator import itemgetter
with open('data.csv', 'rb') as file, open('output.csv', 'wb') as output_file:
writer = csv.writer(output_file)
for row in csv.reader(file):
writer.writerow(map(itemgetter(0), groupby(row)))
It reads the input csv file and writes it to the output csv file with consecutive duplicates removed.
If there could be adjacent duplicate 0
, 1
at the very end of the row then remove duplicates only in row[:-1]
(all but last columns) and append the last bit row[-1]
to the result if you want to preserve it:
from itertools import islice
no_dups = map(itemgetter(0), groupby(islice(row, len(row)-1)))
no_dups.append(row[-1])
writer.writerow(no_dups)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.