I'm new to Python and trying to do a nested loop. I have a very large file (1.1 million rows), and I'd like to use it to create a file that has each line along with the next N lines, for example with the next 3 lines:
1 2
1 3
1 4
2 3
2 4
2 5
Right now I'm just trying to get the loops working with rownumbers instead of the strings since it's easier to visualize. I came up with this code, but it's not behaving how I want it to:
with open('C:/working_file.txt', mode='r', encoding = 'utf8') as f:
for i, line in enumerate(f):
line_a = i
lower_bound = i + 1
upper_bound = i + 4
with open('C:/working_file.txt', mode='r', encoding = 'utf8') as g:
for j, line in enumerate(g):
while j >= lower_bound and j <= upper_bound:
line_b = j
j = j+1
print(line_a, line_b)
Instead of the output I want like above, it's giving me this:
990 991
990 992
990 993
990 994
990 992
990 993
990 994
990 993
990 994
990 994
As you can see the inner loop is iterating multiple times for each line in the outer loop. It seems like there should only be one iteration per line in the outer loop. What am I missing?
EDIT: My question was answered below, here is the exact code I ended up using:
from collections import deque
from itertools import cycle
log = open('C:/example.txt', mode='w', encoding = 'utf8')
try:
xrange
except NameError: # python3
xrange = range
def pack(d):
tup = tuple(d)
return zip(cycle(tup[0:1]), tup[1:])
def window(seq, n=2):
it = iter(seq)
d = deque((next(it, None) for _ in range(n)), maxlen=n)
yield pack(d)
for e in it:
d.append(e)
yield pack(d)
for l in window(open('c:/working_file.txt', mode='r', encoding='utf8'),100):
for a, b in l:
print(a.strip() + '\t' + b.strip(), file=log)
Based on window example from old docs you can use something like:
from collections import deque
from itertools import cycle
try:
xrange
except NameError: # python3
xrange = range
def pack(d):
tup = tuple(d)
return zip(cycle(tup[0:1]), tup[1:])
def window(seq, n=2):
it = iter(seq)
d = deque((next(it, None) for _ in xrange(n)), maxlen=n)
yield pack(d)
for e in it:
d.append(e)
yield pack(d)
Demo:
>>> for l in window([1,2,3,4,5], 4):
... for l1, l2 in l:
... print l1, l2
...
1 2
1 3
1 4
2 3
2 4
2 5
So, basically you can pass your file to window to get desired result:
window(open('C:/working_file.txt', mode='r', encoding='utf8'), 4)
You can do this with slices. This is easiest if you read the whole file into a list first:
with open('C:/working_file.txt', mode='r', encoding = 'utf8') as f:
data = f.readlines()
for i, line_a in enumerate(data):
for j, line_b in enumerate(data[i+1:i+5], start=i+1):
print(i, j)
When you change it to printing the lines instead of the line numbers, you can drop the second enumerate
and just do for line_b in data[i+1:i+5]
. Note that the slice includes the item at the start index, but not the item at the end index, so that needs to be one higher than your current upper bound.
Based on alko's answer, I would suggest using the window
recipe unmodified
from itertools import islice
def window(seq, n=2):
"Returns a sliding window (of width n) over data from the iterable"
" s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... "
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
for l in window([1,2,3,4,5], 4):
for item in l[1:]:
print l[0], item
I think the easiest way to solve this problem would be to read your file into a dictionary...
my_data = {}
for i, line in enumerate(f):
my_data[i] = line
After that is done you can do
for x in my_data:
for y in range(1, 4):
print my_data[x], my_data[x + y]
As written you are reading your million line file a million times for each line...
Since this was quite a big file, you might not want to load it all in memory at once. So to avoid reading a line more than once this is what you do.
Make a list with N elements, where N is the amount of next lines to read.
When a item in that list reaches a length N, take it out and append it to the output file. And add a empty item at the end so you still have a list of N items.
This way you only need to read each line once, and you wont have to load the whole file in memory. You only need to hold, at max, N! lines in memory.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.