I am trying to sort the table based on certain conditions that need to happen in a row. Simplified version of a table:
Number Time
1 23
2 45
3 67
4 23
5 11
6 45
7 123
8 34
...
I need to check if time was <40 5 times in a row. Like I need to check rows 1-5, then 2-6 etc... And then print and save to a file the first and last time. Like, if the condition is met for rows 2-6 I will need to print time for Number 2 and Number 6. The checking should stop after condition has been met. No need to check other rows. I implemented a counter with two temp variables to check for 3 items in a row so far. It works fine. But, if I want to check for the condition that happened 30 times in a row, I can not just create 30 temp variables manually. What is the best way to achieve that? I guess I will just need some kind of a loop. Thanks!
Here is part of my code:
reader = csv.reader(open(filename))
counter, temp1, temp2, numrow = 0, 0, 0, 0
for row in reader:
numrow+=1
if numrow <5:
col0, col1, col4, col5, col6, col23, col24, col25 = float(row[0]),
float(row[1]), float(row[4]), float(row[5]),float(row[6]),
float(row[23]), float(row[24]), float(row[25])
if col1 <= 40:
list1=(col1, col3, col4, col5, col6, col23, col24, col25)
counter += 1
if counter == 3:
print("Cell# %s" %filename[-10:-5])
print LAYOUT.format(*headers_short)
print LAYOUT.format(*temp1)
print LAYOUT.format(*temp2)
print LAYOUT.format(*list1)
print ""
elif counter == 1:
temp1=list1
elif counter == 2:
temp2=list1
else:
counter = 0
I implemented solution suggested by Bakuriu and it seems to be working. But what will be the best way to combine numerous testing? Like I need to check for several conditions. Lets say: v
Right now I just open csv.reader for every testing and run the function. I guess it is not the most efficient way, although it works. Sorry, I am just a complete noob.
csvfiles = glob.glob('processed_data/*.stat')
for filename in csvfiles:
flag=[]
flag.append(filename[-12:-5])
reader = csv.reader(open(filename))
for a, row_group in enumerate(row_grouper(reader,10)):
if all(float(row[1]) < 40 for row in row_group):
str1= "Efficiency is less than 40 in cycles "+ str(a+1)+'-'+str(a+10) #i is the index of the first row in the group.
flag.append(str1)
break #stop processing other rows.
reader = csv.reader(open(filename))
for b, row_group in enumerate(row_grouper(reader,5)):
if all(float(row[3]) < 40 for row in row_group):
str1= "Capacity is less than 40 minutes in cycles "+ str(a+1)+'-'+str(a+5)
flag.append(str1)
break #stop processing other rows.
reader = csv.reader(open(filename))
for b, row_group in enumerate(row_grouper(reader,25)):
if all(float(row[3]) < 40 for row in row_group):
str1= "Time is less than < 40 in cycles "+ str(a+1)+'-'+str(a+25)
flag.append(str1)
break #stop processing other rows.
if len(flag)>1:
for i in flag:
print i
print '\n'
You don't have to sort the data at all. A simple solution might be:
def row_grouper(reader):
iterrows = iter(reader)
current = [next(iterrows) for _ in range(5)]
for next_row in iterrows:
yield current
current.pop(0)
current.append(next_row)
reader = csv.reader(open(filename))
for i, row_group in enumerate(row_grouper(reader)):
if all(float(row[1]) < 40 for row in row_group):
print i, i+5 #i is the index of the first row in the group.
break #stop processing other rows.
The row_grouper
function is a generator that yields 5-element lists of consecutive rows. Every time it removes the first row of the group and adds the new row at the end.
Instead of a plain list
you can use a deque
and replace the pop(0)
in row_grouper
with a popleft()
call which is more efficient, although this doesn't matter much if the list has only 5 elements.
Alternatively you can use martineau suggestion and use the maxlen
keyword argument and avoid pop
ing. This is about twice as fast as using a deque's popleft, which is about twice as fast as using the list
's pop(0)
.
Edit: To check more than one condition you can modify use more than one row_grouper
and use itertools.tee
to obtain copies of the iterables.
For example:
import itertools as it
def check_condition(group, row_index, limit, found):
if group is None or found:
return False
return all(float(row[row_index]) < limit for row in group)
f_iter, s_iter, t_iter = it.tee(iter(reader), 3)
groups = row_grouper(f_iter, 10), row_grouper(s_iter, 5), row_grouper(t_iter, 25)
found_first = found_second = found_third = False
for index, (first, second, third) in enumerate(it.izip_longest(*groups)):
if check_condition(first, 1, 40, found_first):
#stuff
found_first = True
if check_condition(second, 3, 40, found_second):
#stuff
found_second = True
if check_condition(third, 3, 40, found_third):
# stuff
found_third = True
if found_first and found_second and found_third:
#stop the code if we matched all the conditions once.
break
The first part simply imports itertools
(and assigns an "alias" it
to avoid typing itertools
every time).
I've defined the check_condition
function, since the conditions are getting more complicated and you don't want to repeat them over and over. As you can see the last line of check_condition
is the same as the condition before: it checks if the current "row group" verifies the property. Since we plan to iterate over the file only once, and we cannot stop the loop when only one condition is met(since we'd miss the other conditions) we must use some flag that tells us if the condition on (eg) time was met before or not. As you can see in the for
loop, we break
out of the loop when all the conditions are met.
Now, the line:
f_iter, s_iter, t_iter = it.tee(iter(reader), 3)
Creates an iterable over the rows of reader
and makes 3 copies of it. This means that the loop:
for row in f_iter:
print(row)
Will print all the rows of the file, just like doing for row in reader
. Note however that itertools.tee
allows us to obtain copies of the rows without reading the file more than once.
Afterwards, we must pass these rows to the row_grouper
in order to verify the conditions:
groups = row_grouper(f_iter, 10), row_grouper(s_iter, 5), row_grouper(t_iter, 25)
Finally we have to loop over the "row groups". To do this simultaneously we use itertools.izip_longest
(renamed to itertools.zip_longest
(without i
) in python3). It works just like zip
, creating pairs of elements (eg zip([1, 2, 3], ["a", "b", "c"]) -> [(1, "a"), (2, "b"), (3, "c")]
). The difference is that izip_longest
pads the shorter iterables with None
s. This assures that we check the conditions on all the possible groups(and that's also why check_condition
has to check if group
is None
).
To obtain the current row index we wrap everything in enumerate
, just like before. Inside the for
the code is pretty simple: you check the conditions using check_condition
and, if the condition is met you do what you have to do and you have to set the flag for that condition(so that in the following loops the condition will always be False
).
(Note: I must say I did not test the code. I'll test it when I have a bit of time, anyway I hope I gave you some ideas. And check out the documentation for itertools
).
You don't really need to sort your data, just keep track of whether the condition you're looking for has occurred in the last N rows of data. Fixed-size collections.deque
s are good for this sort of thing.
import csv
from collections import deque
filename = 'table.csv'
GROUP_SIZE = 5
THRESHOLD = 40
cond_deque = deque(maxlen=GROUP_SIZE)
with open(filename) as datafile:
reader = csv.reader(datafile) # assume delimiter=','
reader.next() # skip header row
for linenum, row in enumerate(reader, start=1): # process rows of file
col0, col1, col4, col5, col6, col23, col24, col25 = (
float(row[i]) for i in (0, 1, 4, 5, 6, 23, 24, 25))
cond_deque.append(col1 < THRESHOLD)
if cond_deque.count(True) == GROUP_SIZE:
print 'lines {}-{} had {} consecutive rows with col1 < {}'.format(
linenum-GROUP_SIZE+1, linenum, GROUP_SIZE, THRESHOLD)
break # found, so stop looking
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.