如何使用bs4從網站中提取標簽下的數據

Question

<html>

<head>
  <title>Index of /pub/opera/desktop/</title>
</head>

<body>
  <h1>Index of /pub/opera/desktop/</h1>
  <hr>
  <pre><a href="../">../</a>
<a href="15.0.1147.130/">15.0.1147.130/</a>                                     01-Jul-2013 15:18                   -
<a href="15.0.1147.132/">15.0.1147.132/</a>                                     01-Jul-2013 15:18                   -
<a href="15.0.1147.138/">15.0.1147.138/</a>                                     09-Jul-2013 12:11

我需要提取版本 15.0.1147.130 和日期 01-Jul-2013 15:18 但是，使用我的代碼，它只給我版本

soup = BeautifulSoup(requests.get('https://get.geo.opera.com/pub/opera/desktop/').text, 'html.parser')
for item in soup.find('pre').find_all('a')[1:]:
    print(item)

我還缺少什么來獲取日期文本？

Answer 1

你得到“A”標簽，它們不包含日期

    soup = BeautifulSoup(requests.get('https://get.geo.opera.com/pub/opera/desktop/').text, 'html.parser')
    for item in soup.find_all('pre'):
    version = item
    print(version.getText().replace('/', "").replace('-', ""))

更新

import requests
from bs4 import BeautifulSoup
import re


soup = BeautifulSoup(requests.get('https://get.geo.opera.com/pub/opera/desktop/').text, 'html.parser')
lines = soup.find('pre').getText().replace('/', "").replace('-', "").split('\r')

for line in lines[1:-1]:
    my_data = re.sub(' +', ' ', line).split(' ')
    geo = my_data[0]
    date = my_data[1]
    time = my_data[2]
    print(geo, date, time)

如何使用bs4從網站中提取標簽下的數據

問題描述

1 個解決方案

解決方案1
1 已采納 2021-12-24 09:36:21

如何使用bs4從網站中提取標簽下的數據

問題描述

1 個解決方案

解決方案1 1 已采納 2021-12-24 09:36:21

解決方案1
1 已采納 2021-12-24 09:36:21