简体   繁体   English

<span>使用 BeatifulSoup 将 HTML 标记中的所有日期放入列表中</span>

[英]Put all dates from HTML tag <span> into a list using BeatifulSoup

There is my HTML file:有我的 HTML 文件:

[<small class="breadcrumb x-normal">
<span><i data-icon="clock"></i>Today 10:52</span>
</small>]
[<small class="breadcrumb x-normal">
<span><i data-icon="clock"></i>April 11</span>
</small>]
[<small class="breadcrumb x-normal">
<span><i data-icon="clock"></i>April 5</span>
</small>]
<span><i data-icon="clock"></i>February 29</span>
</small>]

How do I put all these dates into a list.我如何将所有这些日期放入列表中。

Here it is my code:这是我的代码:

  from bs4 import BeautifulSoup
    import lxml

    def get_dates(html):
        soup = BeautifulSoup(html, 'lxml')
            dates = soup.pass
            print (date)

  get_dates(html.text)

Example例子

from bs4 import BeautifulSoup

html = '<small class="breadcrumb x-normal"><span><i data-icon="clock"></i>Today 10:52</span></small>' \
       '<small class="breadcrumb x-normal"><span><i data-icon="clock"></i>April 11</span></small>' \
       '<small class="breadcrumb x-normal"><span><i data-icon="clock"></i>April 5</span></small>' \
       '<small class="breadcrumb x-normal"><span><i data-icon="clock"></i>February 29</span></small>'

soup = BeautifulSoup(html, features="lxml")
date_list = []
dates = soup.find_all('small', {'class':'breadcrumb x-normal'})

for date in dates:
    print(date.text)
    date_list.append(date.text)


print(date_list)
from bs4 import BeautifulSoup

html = '<small class="breadcrumb x-normal"><span><i data-icon="clock"></i>Today 10:52</span></small>' \
       '<small class="breadcrumb x-normal"><span><i data-icon="clock"></i>April 11</span></small>' \
       '<small class="breadcrumb x-normal"><span><i data-icon="clock"></i>April 5</span></small>' \
       '<small class="breadcrumb x-normal"><span><i data-icon="clock"></i>February 29</span></small>'

soup = BeautifulSoup(html, 'html.parser')

data = [item.next_element for item in soup.findAll(
    "i", {'data-icon': 'clock'})]

print(data)

Output: Output:

['Today 10:52', 'April 11', 'April 5', 'February 29']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM