import re
fr=open("test.html",'r')
i,j,tablestart=0,0,0
str=""
p=re.compile("<td.*?>(.*?)<\/td>")
for line in fr:
if "<table" in line:
tablestart=1
elif "</table>" in line and tablestart==1:
j,tablestart=0,0
m=p.search(line)
if m and tablestart==1:
str+='"' + m.group(1) + '"' + ","
if "</tr>" in line and tablestart==1:
print(str)
str=""
The code is creating csv file from html table. Is there a better or more efficient way to code this? I'm not looking for any html parsers.
Maybe something like this:
for line in fr:
if re.search(r'"<td.*?>.+?<\/td>"',line):
line_table = re.findall(r'\>\.+?\<',line)
var = line_table
for var1 in var:
if var1 != False:
var2 = re.findall(r'\>\.+?\<',var1)[0]
output.write(var2+','+'\n')
else:
output.write(','+'\n')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.