<h2>Summary</h2>
<p>This is summary one.</p>
<p>contains details of summary1.</p>
<h2>Software/OS</h2>
<p>windows xp</p>
<h2>HARDWARE</h2>
<p>Intel core i5</p>
<p>8 GB RAM</p>
I want to create a dictionary from above where keys = header tags and value = paragraph tags.
I want output in this format
{"summary":["This is summary one.","contains details of summary1."], "Software/OS": "windows xp", "HARDWARE": ["Intel core i5","8 GB RAM"]}
Can anyone help me with this. thanks in advance.
You can use this script to make a dictionary where keys are text from <h2>
and values are lists of <p>
texts:
from bs4 import BeautifulSoup
txt = '''<h2>Summary</h2>
<p>This is summary one.</p>
<p>contains details of summary1.</p>
<h2>Software/OS</h2>
<p>windows xp</p>
<h2>HARDWARE</h2>
<p>Intel core i5</p>
<p>8 GB RAM</p>'''
soup = BeautifulSoup(txt, 'html.parser')
out = {}
for p in soup.select('p'):
out.setdefault(p.find_previous('h2').text, []).append(p.text)
print(out)
Prints:
{'Summary': ['This is summary one.', 'contains details of summary1.'], 'Software/OS': ['windows xp'], 'HARDWARE': ['Intel core i5', '8 GB RAM']}
If you don't want to have lists of length==1, you can do additionally:
for k in out:
if len(out[k]) == 1:
out[k] = out[k][0]
print(out)
Prints:
{'Summary': ['This is summary one.', 'contains details of summary1.'], 'Software/OS': 'windows xp', 'HARDWARE': ['Intel core i5', '8 GB RAM']}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.