I have a series of files and I want to extract a specific number from each of them. In each of the files I have this line:
name, registration num
and exactly two lines after that there is the registration number. I would like to extract this number from each file. and put it as a value of a dictionary.Anyone have any idea how it is possible ?
my current code that does not actually work is like below:
matches=[]
for root, dirnames, filenames in os.walk('D:/Dataset2'):
for filename in fnmatch.filter(filenames, '*.txt'):
matches.append([root, filename])
filenames_list={}
for root,filename in matches:
filename_key = (os.path.join(filename).strip()).split('.',1)[0]
fullfilename = os.path.join(root, filename)
f= open(fullfilename, 'r')
for line in f:
if "<name, registration num'" in line:
key=filename_key
line+=2
val=line
I usually use next()
when I want to skip a single line, usually a header for a file.
with open(file_path) as f:
next(f) # skip 1 line
next(f) # skip another one.
for line in f:
pass # now you can keep reading as if there was no first or second line.
Note: In Python 2.6 or earlier you must use f.next()
One way would be to load the whole line into an array, and then read the line(s) you want from it. Example
A file called testfile contains the following:
A1
B2
C3
D4
E5
A program test.py:
#!/usr/bin/env python
file = open('testfile')
lines = file.readlines()[2:]
file.close()
for line in lines:
print(line.strip())
Output:
$./test.py
C3
D4
E5
EDIT: I read the question again, and noticed you just want a single line. Then you could just remove the :
, and use f.getlines()[2]
to get the third line in a file
Or you could use f.getline() three times, and just ignore the first two
Or you could use a for line in f
type loop, and just ignore the first two line (have an incrementing counter)
I suppose something like that would work...
f= open(fullfilename, 'r')
for line in f:
if "name, registration num" in line:
key=filename_key
break
f.readline()
res = f.readline()[:-1] #removed trailin newline
from itertools import islice
with open('data.txt') as f:
for line in islice(f, 2, None):
print line
Generally speaking, if you want to do something to a python iterator in-loop, like look two ahead, I find a good first place to look is to import itertools
and look here . In your case, you might benefit from their implementation of consume
.
Worth having a look to see if this issue hasn't been covered on SO before. Edit: Indeed- look here , which includes a good discussion of python iterators.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.