I am trying to build a script that will extract specific parts (namely the link & its related description) out of an html file and return the result per line.
I 'm trying to build it using the lists in python, yet I 'm making a mistake somehow!
This is what I 've done so far, but it returns blank my values list:
import re
def subtext (data, first_link, last_link, first_descr, last_descr):
values = []
link = re.search('''"first_link"(.+?)"last_link"''', data)
values.append(link)
descr = re.search('''"first_descr"(.+?)"last_descr"''', data)
values.append(descr)
while values:
print(values)
html_file = input ("Type filepath: ")
html_code = open (html_file, "r")
html_data = html_code.read()
subtext (html_data, '''11px;"><a href=''', ''' target="_blank" ''', ''' title="Relative document">''', '''</a></td><td style="font-''')
html_code.close()
There is a html parser for python . But if you want use your code then you need fix those mistakes:
link = re.search('''"first_link"(.+?)"last_link"''', data)
values.append(link)
First of all, Your regex will search for strings "first_link" and "last_link" instead of values from function args. Use .format to create string form args. Also in above code link
will be re.Match object, not a string. Use group()
to pick string from object - just make sure that it found something. Same story with next re.search
.
while values:
print(values)
Here you will get into infinite loop of prints. Simply do print(values)
without any loop.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.