简体   繁体   English

使用BeautifulSoup获取特定标记后的值

[英]Use BeautifulSoup to get a value after a specific tag

I'm having a very hard time getting BeautifulSoup to scrape some data for me. 我很难让BeautifulSoup为我收集一些数据。 What's the best way to access the date (the actual numbers, 2008) from this code sample? 从此代码示例中访问日期(实际数字,2008)的最佳方法是什么? It's my first time using Beautifulsoup, I've figured out how to scrape urls off of the page, but I can't quite narrow it down to only select the word Date, and then to only return whatever numeric date follows (in the dd brackets). 这是我第一次使用Beautifulsoup,我已经弄清楚如何从网页上删除网址,但我不能将其缩小到只选择单词Date,然后只返回任何数字日期(在dd中)括号)。 Is what I'm asking even possible? 我甚至可能要问什么?

<div class='dl_item_container clearfix detail_date'>
    <dt>Date</dt>
    <dd>
        2008
    </dd>
</div>

Find the dt tag by text and find the next dd sibling : 按文字查找dt标签,找到下一个dd兄弟

soup.find('div', class_='detail_date').find('dt', text='Date').find_next_sibling('dd').text

The complete code: 完整的代码:

from bs4 import BeautifulSoup

data = """
<div class='dl_item_container clearfix detail_date'>
    <dt>Date</dt>
    <dd>
    2008
    </dd>
</div>
"""

soup = BeautifulSoup(data)
date_field = soup.find('div', class_='detail_date').find('dt', text='Date')
print date_field.find_next_sibling('dd').text.strip()

Prints 2008 . 打印2008

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM