So I am trying to extract a link within a textfile in Python -- this link varies from textfile to textfile but has the same format. I tried using the re library but keep getting errors.
The syntax of the link is:
docs.com/searchres.aspx?docformat=all&docid=[SOME NUMBER] -
So the end of the link has a specifying number in the SOME NUMBER field and at the end of the link there is a ' - ' How can I search, find, and save this link from a textfile. Thank you -- this is my first time posting on SO
Here's a Python solution that uses memory maps. A few caveats:
]
is not in the text file, it will continue reading. Take a look at the mmap
documentation here to see how you might modify the code to be more robust. EDIT: Python's code formatter hates me, so I had to make some minor changes to get it to block properly. Sorry about that.
match = open(db, 'r')
try:
search = mmap.mmap(match.fileno(), 0, access=mmap.ACCESS_READ)
index = search.find(str(target))
if index != -1:
#"This entry exists. We have the index of it, now read the line."
search.seek(index)
#"Seek to the index."
strOut = ""
read = search.read(1)
while read != ']':
strOut = strOut + read
read = search.read(1)
search.close()
match.close()
print strOut
else:
#-1 indicates it's not in the file
print strOut
except Exception as err:
match.close()
print strOut
So this response is simple, but works for small files. When you say "save this link" I assume having the url in a string variable is good enough.
import re
f = open(filename_str, 'r')
file_content = f.read()
p = re.compile('docs.com(.)*\-')
m = p.search(file_content)
if m != None:
link = m.group(0)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.