How to filter specific <p> tags for <h2> tags using beautiful soup in python and then build a dictionary out of it

Question

<h2>Summary</h2>
<p>This is summary one.</p>
<p>contains details of summary1.</p>

<h2>Software/OS</h2>
<p>windows xp</p>

<h2>HARDWARE</h2>
<p>Intel core i5</p>
<p>8 GB RAM</p>

I want to create a dictionary from above where keys = header tags and value = paragraph tags.

I want output in this format

{"summary":["This is summary one.","contains details of summary1."], "Software/OS": "windows xp", "HARDWARE": ["Intel core i5","8 GB RAM"]}

Can anyone help me with this. thanks in advance.

Answer 1

You can use this script to make a dictionary where keys are text from <h2> and values are lists of <p> texts:

from bs4 import BeautifulSoup


txt = '''<h2>Summary</h2>
<p>This is summary one.</p>
<p>contains details of summary1.</p>

<h2>Software/OS</h2>
<p>windows xp</p>

<h2>HARDWARE</h2>
<p>Intel core i5</p>
<p>8 GB RAM</p>'''

soup = BeautifulSoup(txt, 'html.parser')

out = {}
for p in soup.select('p'):
    out.setdefault(p.find_previous('h2').text, []).append(p.text)

print(out)

Prints:

{'Summary': ['This is summary one.', 'contains details of summary1.'], 'Software/OS': ['windows xp'], 'HARDWARE': ['Intel core i5', '8 GB RAM']}

If you don't want to have lists of length==1, you can do additionally:

for k in out:
    if len(out[k]) == 1:
        out[k] = out[k][0]

print(out)

Prints:

{'Summary': ['This is summary one.', 'contains details of summary1.'], 'Software/OS': 'windows xp', 'HARDWARE': ['Intel core i5', '8 GB RAM']}

How to filter specific <p> tags for <h2> tags using beautiful soup in python and then build a dictionary out of it

Question

1 answers

solution1
0 ACCPTED 2020-08-21 17:01:37

How to filter specific <p> tags for <h2> tags using beautiful soup in python and then build a dictionary out of it

Question

1 answers

solution1 0 ACCPTED 2020-08-21 17:01:37

solution1
0 ACCPTED 2020-08-21 17:01:37