Python: BeautifulSoup 查找属性

Question

我想抓取一个网站以获得一些百分比。 到目前为止，这是代码：

import requests
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError
lista=[]

site = 'https://es.investing.com/indices/indices-futures'
harware = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0'}
request = Request(site,headers=harware)
page = urlopen(request)
soup = BeautifulSoup(page, 'html.parser')
#print(soup)

cotizacion = soup.find_all('td',{"class": "datatable_cell__3gwri datatable_cell--align-end__Wua8C datatable_cell--" + "down__2CL8n" +" datatable_cell--bold__3e0BR table-browser_col-chg-pct__9p1T3"})
for datos in cotizacion:
    indices = datos.get_text()
    lista.append(indices)
print(lista)

有了这个，我在列表中得到了一堆百分比。 但我的问题是 class 属性仅在百分比为负时获取数据，因为 class 名称用于向下（“down__2CL8n”），但当它向上时，ZA2F2ED4F8EBC2CBB4C21A249DC40AB61DZ 名称是相同的“除了”部分。 我想同时得到，积极的和消极的。 所以我尝试了以下方法：

soup.find_all('td',{"class": "datatable_cell__3gwri datatable_cell--align-end__Wua8C datatable_cell--" + "down__2CL8n" or "up__2984w" +" datatable_cell--bold__3e0BR table-browser_col-chg-pct__9p1T3"})

但这不起作用。 获取字符串可变部分的格式将如何？

Answer 1

The desired output is under the attribute table-browser_col-chg-pct__9p1T3 , To only select the first table you can use a CSS Selector .mb-6 td.table-browser_col-chg-pct__9p1T3 .

import requests
from bs4 import BeautifulSoup


URL = "https://es.investing.com/indices/indices-futures"
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0"
}

soup = BeautifulSoup(requests.get(URL, headers=headers).content, "html.parser")

print([tag.text for tag in soup.select(".mb-6 td.table-browser_col-chg-pct__9p1T3")])

Output：

['+0,12%', '+0,73%', '+1,97%', '+0,95%', '+1,13%', '+0,03%', '-0,15%', '-0,73%', '-0,05%', '+0,22%', '-0,65%', '-0,16%', '-0,37%', '-0,21%', '+0,11%', '-0,41%', '-0,40%', '-0,15%', '-0,38%', '+0,69%', '-0,89%', '-1,13%', '+0,23%', '-0,89%', '-0,75%', '-1,51%', '-0,22%', '+0,43%', '-1,27%', '+0,92%']

Answer 2

我会避免可能是动态 class 值，而是确定所需值属于哪一列； 然后使用:nth-of-type从表中切出该列。 要获取表格，我将使用带有属性 = 值选择器的 go 来获取带有data-test=price-table的父元素，然后使用后代组合器移动到子table元素。 其目的是尝试并随着时间的推移开发出更强大的东西。 当然，这特别引入了 header 字符串依赖。

import requests
from bs4 import BeautifulSoup

URL = "https://es.investing.com/indices/indices-futures"
headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:84.0) Gecko/20100101 Firefox/84.0"}
soup = BeautifulSoup(requests.get(URL, headers=headers).content, "html.parser")
index = [i.text for i in soup.select('[data-test=price-table] table th')].index('% Var.') + 1
print([i.text for i in soup.select(f"[data-test=price-table] table td:nth-of-type({index})")])

您也可以只使用 pandas read_html：

import pandas as pd

table = pd.read_html('https://es.investing.com/indices/indices-futures')[0]
table['% Var.']

Answer 3

您可以将其作为下一步执行（假设顺序无关紧要）：

cotizacion += soup.find_all('td',{"class": "datatable_cell__3gwri datatable_cell--align-end__Wua8C datatable_cell--" + "up__2984w" +" datatable_cell--bold__3e0BR table-browser_col-chg-pct__9p1T3"})

编辑：正如评论所建议的，订单很重要，你可以参考这个答案：https://stackoverflow.com/a/14257743/8651239

Python: BeautifulSoup 查找属性

问题描述

3 个解决方案

解决方案1
2 已采纳 2021-03-31 16:07:09

解决方案2
1 2021-03-31 17:43:15

解决方案3
0 2021-03-31 15:42:46

Python: BeautifulSoup 查找属性

问题描述

3 个解决方案

解决方案1 2 已采纳 2021-03-31 16:07:09

解决方案2 1 2021-03-31 17:43:15

解决方案3 0 2021-03-31 15:42:46

解决方案1
2 已采纳 2021-03-31 16:07:09

解决方案2
1 2021-03-31 17:43:15

解决方案3
0 2021-03-31 15:42:46