简体   繁体   中英

Python 3: Checking next value of an iterator without iterating

For a project I need to check in a csv file if the value in a certain column of the next row is equal to the value in the same column of the current row. I am using a dictionary csv reader, ie each row in the reader is a dictionary file. I can access a value of a certain row by using the column header as a key: row[header] = value .

A stripped-down version of my current code looks like this:

import csv

with open(os.path.abspath(path_to_file), "r") as f:
    reader = csv.DictReader(f, dialect='excel')

    for row in reader:
        current_row = row
        next_row = reader.__next__()
        if current_row[column] == next_row[column]:
            dosomething()

The problem here is of course that I skip the next iteration by using __next__() , ie

(1) I enter the loop; row = row1 (2) current_row = row1, next_row = row2 (3) I enter the next iteration of the loop; row = row3 because I used __next__() . current_row = row3, next_row = row4

In this example I would never check row2 == row3 .

Is it possible to check the values of the next row without iterating over the iterator object? Or, alternatively, is there an opposite methode to __next__() , which makes the iterator go back one step?

Please note : I'm comparing the current value to the next value instead of the current value to the last value because I don't know how long the file I'm reading is. I have to treat the last row of the file different from the others, so I have to check reader.__next()__ anyway to see if there is a last line at all.

Try the itertools pairwise recipe. A more general solution is to tee your iterator (which is what the pairwise recipe uses). Another possibility is to create a function that has a cur and next variable and yields the values you want (basically what pairwise does, but you could make this yield the fields in your CSV rather than entire rows).

From https://docs.python.org/2/library/itertools.html

def pairwise(iterable):
     "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

I think pairwise does everything you need here, so no fussing with your own generator function or tee .

reader = csv.DictReader(f, dialect='excel')

for current_row, next_row in pairwise(reader):
    if current_row[column] == next_row[column]:
        dosomething()

Realize that if you have an iterable with n items, there will be n-1 items in pairwise.

Your requirements conflict with idea of built-in iterator. So I suggest you encapsulate your cycling inside custom iterator. Idea is to yield two values from original iterator having None for the next value of last row.

i dont know if this might work but it works in android

reader = csv.DictReader(f, dialect='excel')
  reader2 = csv.DictReader(f, dialect='excel')

  for row in reader:
    current_row = row

        for row2 in reader2
          next_row = reader2.__next__()
          if current_row[column] == next_row[column]:
          dosomething()
          continue

Personally, I would look back, instead of looking ahead, assuming the constraints of your scenario allow it:

it = iter(reader)
prev_row = it.next()
while True:
    try:
        cur_row = it.next()
        if cur_row[column] == prev_row[column]:
            dosomething()
        prev_row = cur_row
    except StopIteration:
        break

Because dictionaries must retain unique keys (so cannot append rows with same keys) and csv.DictReader object is not subscriptable (so cannot reference column or row number), consider reading csv data into a list and then compare row to subsequent row:

import csv

with open(os.path.abspath(path_to_file), "r") as f:
    reader = csv.reader(f)

# APPEND READER LINES INTO LIST    
csvList = []
for row in reader:
    csvList.append(row)

# ITERATE THROUGH LIST, CHECK AGAINST NEXT ROW
for i in range(len(csvList) - 1):
    # FIND THE COLUMN NUMBER (BELOW USES 1)
    if (csvList[i][1] == csvList[i + 1][1]):
        doSomething()            
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

(sy,None) or (sy,"") would be the only logical possibilities for the last tuple because the values are popped from a queue one at a time until end of iter.

"Once tee() has made a split, the original iterable should not be used anywhere else; otherwise, the iterable could get advanced without the tee objects being informed."

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM