Extract parts of text (html) file based on characters before & after with python

Question

I am trying to build a script that will extract specific parts (namely the link & its related description) out of an html file and return the result per line.

I 'm trying to build it using the lists in python, yet I 'm making a mistake somehow!

This is what I 've done so far, but it returns blank my values list:


import re

def subtext (data, first_link, last_link, first_descr, last_descr):
    values = []
    
    link = re.search('''"first_link"(.+?)"last_link"''', data)
    values.append(link)
    descr = re.search('''"first_descr"(.+?)"last_descr"''', data)
    values.append(descr)
    while values:
        print(values)


html_file = input ("Type filepath: ")
html_code = open (html_file, "r")
html_data = html_code.read()


subtext (html_data, '''11px;"><a href=''', ''' target="_blank"  ''', '''  title="Relative document">''', '''</a></td><td style="font-''')


html_code.close()

Answer 1

There is a html parser for python . But if you want use your code then you need fix those mistakes:

link = re.search('''"first_link"(.+?)"last_link"''', data)
values.append(link)

First of all, Your regex will search for strings "first_link" and "last_link" instead of values from function args. Use .format to create string form args. Also in above code link will be re.Match object, not a string. Use group() to pick string from object - just make sure that it found something. Same story with next re.search .

   while values:
      print(values)

Here you will get into infinite loop of prints. Simply do print(values) without any loop.

Extract parts of text (html) file based on characters before & after with python

Question

1 answers

solution1
0 2021-02-12 18:46:53

Extract parts of text (html) file based on characters before & after with python

Question

1 answers

solution1 0 2021-02-12 18:46:53

solution1
0 2021-02-12 18:46:53