I have the following html construct
...
<div cust-attrib-id="root">
<div cust-attrib-id="root-title"></div>
<div cust-attrib-id="country">
<div cust-attrib-id="country-title"></div>
<div cust-attrib-id="region">
<div cust-attrib-id="region-title">
<a href="xx">Frankfurt</a>
</div>
<div cust-attrib-id="region-title">
<a href="xx">Braunschweig</a>
</div>
<div cust-attrib-id="region-title">
<a href="xx">Hamm</a>
</div>
...
</div>
</div>
</div>
...
What is the easiest way to get the <a>
-Tags with the list of regions in Python when using Beautifulsoap? Each <a>
tag belongs to a Div
with the custom attribute cust-attrib-id
and the value region-title
.
I am at the div with the custom attriubute-value root
and i would like to iterate over al sub sub sub <a>
's within the div
's with the custom attribute cust-attrib-id
and value
= 'region-title'.
I am selectingthe root element via
soup = BeautifulSoup(source, "html.parser")
rootCategories = soup.select('div[cust-attrib-id="root"]')
Now i could find country
, then find all region
's and iterate over the result via for... in...
. But i am looking for a "shortcut" to get these items queried.
So the desired result would be an output like
Frankfurt Braunschweig Hamm
and
cities = soup.select('div[cust-attrib-id="root"]\\div[cust-attrib-id="country"]\\div[cust-attrib-id="region-title"]')
I think having it cascaded in the query makes it more safe, cause attribute value region-title is not unique on the page.
Note: Good answers require good questions, please help make your problem comprehensible to all by improving your question. In general, the existing code and the expected result should be presented as text. Please always provide an mcve in your questions.
You can use css selectors
to select all the <a>
in your html.
I think having it cascaded in the query makes it more safe, cause attribute value region-title is not unique on the page.
Making your selection as specific as possible is a very good train of thought - Just chain the selectors of attributes and tags to get all the <a>
you need:
soup.select('div[cust-attrib-id="root"] [cust-attrib-id="region-title"] a')
To get a list of all the city names you can use your selection and a list comprehension
:
cities = [t.text for t in soup.select('div[cust-attrib-id="root"] [cust-attrib-id="region-title"] a')]
from bs4 import BeautifulSoup
html = '''<div cust-attrib-id="root">
<div cust-attrib-id="root-title"></div>
<div cust-attrib-id="country">
<div cust-attrib-id="country-title"></div>
<div cust-attrib-id="region">
<div cust-attrib-id="region-title">
<a href="xx">Frankfurt</a>
</div>
<div cust-attrib-id="region-title">
<a href="xx">Braunschweig</a>
</div>
<div cust-attrib-id="region-title">
<a href="xx">Hamm</a>
</div>
...
</div>
</div>
</div>'''
soup = BeautifulSoup(html, "lxml")
cities = [t.text for t in soup.select('div[cust-attrib-id="root"] [cust-attrib-id="region-title"] a')]
['Frankfurt', 'Braunschweig', 'Hamm']
On your example "region-title" it what you want, so just get every "region-title"
for x in soup.find_all(attrs={"cust-attrib-id": 'region-title'}):
print(x.getText())
Output:
Frankfurt
Braunschweig
Hamm
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.