简体   繁体   中英

Needing help on a python program. How to search and save IDs from a HTML

Currently I'm trying to write a program that will search for a tag and the characters in front of that tag (until a space or enter) on a html local file but i don't know how, I worte some code but it isn't working, it only lists all the text on the html instead of looking for the PA and the characters.

Here's my code so far:

from bs4 import BeautifulSoup
import re

ecj_data = open('output.html', 'r').read()
soup = BeautifulSoup(ecj_data, 'lxml')
d = 'PA'
soup_strings = [ l for l in list(soup.strings) if l.strip() != '' ]
for s in soup_strings :
    print(s)

Do you mean to search word including 'PA'? Please try below.

for i in soup.strings.split(' '):
    if 'PA' in i:
        print (i)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM