[英]Needing help on a python program. How to search and save IDs from a HTML
Currently I'm trying to write a program that will search for a tag and the characters in front of that tag (until a space or enter) on a html local file but i don't know how, I worte some code but it isn't working, it only lists all the text on the html instead of looking for the PA and the characters. 目前,我正在尝试编写一个程序,该程序将在html本地文件上搜索标签和该标签前面的字符(直到空格或输入),但我不知道如何,我编写了一些代码,但事实并非如此无效,它仅列出html上的所有文本,而不是查找PA和字符。
Here's my code so far: 到目前为止,这是我的代码:
from bs4 import BeautifulSoup
import re
ecj_data = open('output.html', 'r').read()
soup = BeautifulSoup(ecj_data, 'lxml')
d = 'PA'
soup_strings = [ l for l in list(soup.strings) if l.strip() != '' ]
for s in soup_strings :
print(s)
Do you mean to search word including 'PA'? 您是要搜索包括“ PA”在内的单词吗? Please try below.
请尝试以下。
for i in soup.strings.split(' '):
if 'PA' in i:
print (i)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.