This is how a part of the HTML code looks like.
<div class="field-wrapper field field-node--field-test-synonyms field-name-field-test-synonyms field-type-string field-label-inline clearfix">
<div class="field-label">Also Known As</div>
<div class="field-items">
<div class="field-item">17-OHP</div>
<div class="field-item">17-OH Progesterone </div>
</div>
</div>
What I am trying to do is extract the two words 17-OHP
and 17-OH Progesterone
.
My Code
sub_url = "https://labtestsonline.org/tests/17-hydroxyprogesterone"
response = requests.get(sub_url)
soup = BeautifulSoup(response.content, 'lxml' )
other_names = []
table = soup.findAll('div',attrs={"class":"field_items"})
print(x.text)
other_names.append(x.text)
But the problem is the class field-items
is used so many places in the web page. So I get lots of unexpected words. Please help me how to find an unique tag in this case. The output I expect is other_names = ['17-OHP','17-OH Progesterone']
Thank You.
You can search for a class named field-label
, and than call .next
:
import requests
from bs4 import BeautifulSoup
sub_url = "https://labtestsonline.org/tests/17-hydroxyprogesterone"
response = requests.get(sub_url)
soup = BeautifulSoup(response.content, 'html.parser')
other_names = [
tag.next.next.get_text(strip=True, separator='|').split('|')
for tag in soup.find('div', class_='field-label')
]
print(*other_names)
Output:
['17-OHP', '17-OH Progesterone']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.