簡體   English   中英

Python:如何在 BeautiflSoup 中從這樣的標簽中獲取文本

[英]Python: How can i get text from a tag like this in BeautiflSoup

I need to get the date and hour of this links: ' https://www.pagina12.com.ar/225378-murio-cacho-castana-simbolo-del-macho-porteno ' or any in the site ' https:/ /www.pagina12.com.ar/ '。

結構是這樣的:

<div class="article-info"><div class="breadcrumb"><div class="suplement"><a href="https://www.pagina12.com.ar/suplementos/cultura-y-espectaculos/notas">Cultura y Espectáculos</a></div><div class="topic"></div></div><div class="time"><span datetime="2019-10-15" pubdate="pubdate">15 de octubre de 2019</span><span> · </span><span>Actualizado hace <span class="article-time" data-time="1571156914">3 hs</span></span></div></div>

我這樣做了:

cosa = requests.get('https://www.pagina12.com.ar/225378-murio-cacho-castana-simbolo-del-macho-porteno').text
parse = BeautifulSoup(cosa, 'html5lib')
info = parse.findAll('div', {'class':'article-info'})

然后我嘗試獲取顯示“3 Hs”的文本並且無法訪問它並且不知道該怎么做。 有人有想法嗎?

謝謝!

您可以從data-time屬性計算

from bs4 import BeautifulSoup as bs
import requests, datetime
import dateutil.relativedelta

r = requests.get('https://www.pagina12.com.ar/225378-murio-cacho-castana-simbolo-del-macho-porteno')
soup = bs(r.content, 'lxml')
dt1 = datetime.datetime.fromtimestamp(float(soup.select_one('[data-time]')['data-time']))
dt2 = datetime.datetime.fromtimestamp(datetime.datetime.now().timestamp()) 
diff = dateutil.relativedelta.relativedelta(dt2, dt1)
print(diff.hours)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM