Python extract html webpage content using keywords

Question

Using python would like to extract context by matching keywords,

Here is my python script

import requests
from bs4 import BeautifulSoup
import re
html = """ <pre>
      Companies:
       Telstra VI Huawei
      Countries:
       JPN CHN MLY
   </pre>
   <pre>
   Data center:
    US UK
   </pre>"""
r = requests.get(html)
soup = BeautifulSoup(r.content, "html.parser")
k = soup.find(text=re.compile("companies:")).parent.text
print (k)

Expected output:

Companies:
       Telstra VI Huawei

Answer 1

Try this.

from simplified_scrapy import SimplifiedDoc

html = """ <pre>
      Companies:
       Telstra VI Huawei
      Countries:
       JPN CHN MLY
   </pre>
   <pre>
   Data center:
    US UK
   </pre>"""
doc = SimplifiedDoc(html)
pre = doc.getElementByReg('Companies:')
print(pre.text)
print('-' * 50)
print(pre.replaceReg('Countries:[\s\S]*', '').strip())

Result:

Companies: Telstra VI Huawei Countries: JPN CHN MLY
--------------------------------------------------
Companies:
       Telstra VI Huawei

Python extract html webpage content using keywords

Question

1 answers

solution1
0 2020-09-30 00:56:08

Python extract html webpage content using keywords

Question

1 answers

solution1 0 2020-09-30 00:56:08

solution1
0 2020-09-30 00:56:08