使用BeautifulSoup从范围中提取数据

Question

I'm trying to extract data from a span with BeautifulSoup from two differents way 我正在尝试使用BeautifulSoup从两种不同的方式提取跨度中的数据

import requests
import bs4

url ='https://www.futbin.com/19/player/477/Jordan%20Henderson/'
page = requests.get(url).content
soup = bs4.BeautifulSoup(page, 'lxml')



price1 = soup.find("div", {"class": "bin_price lbin"}).span.contents
price2 = soup.select('#ps-lowest-1')


print(price1)
print(price2)

It gave me two results 它给了我两个结果

[u'\n', <span id="ps-lowest-1">-</span>, u'\n']
[<span id="ps-lowest-1">-</span>]
[Finished in 1.0s]

Now I would like to extract data (the price) from this span and I can't Thank you for your help. 现在，我想从该范围中提取数据（价格），谢谢您的帮助。

Answer 1

The actual prices are not present in the HTML you get inside the page variable. 您在page变量内获得的HTML中没有实际价格。 Prices are loaded dynamically via a separate request in your browser . 通过浏览器中的单独请求动态加载价格 。

You could simulate that request in your code as well: 您也可以在代码中模拟该请求：

from pprint import pprint
import requests

url ='https://www.futbin.com/19/playerPrices?player=183711'
page = requests.get(url).json()

pprint(page)

Would print: 将打印：

{u'183711': {u'prices': {u'pc': {u'LCPrice': u'1,500',
                                 u'LCPrice2': u'1,500',
                                 u'LCPrice3': u'1,500',
                                 u'LCPrice4': u'1,500',
                                 u'LCPrice5': u'1,500',
                                 u'MaxPrice': u'10,000',
                                 u'MinPrice': u'700',
                                 u'PRP': u'8',
                                 u'updated': u'49 mins ago'},
                         u'ps': {u'LCPrice': u'1,300',
                                 u'LCPrice2': u'1,300',
                                 u'LCPrice3': u'1,300',
                                 u'LCPrice4': u'1,300',
                                 u'LCPrice5': u'1,300',
                                 u'MaxPrice': u'10,000',
                                 u'MinPrice': u'700',
                                 u'PRP': u'6',
                                 u'updated': u'25 mins ago'},
                         u'xbox': {u'LCPrice': u'1,500',
                                   u'LCPrice2': u'1,500',
                                   u'LCPrice3': u'1,600',
                                   u'LCPrice4': u'1,600',
                                   u'LCPrice5': u'1,600',
                                   u'MaxPrice': u'10,000',
                                   u'MinPrice': u'700',
                                   u'PRP': u'8',
                                   u'updated': u'30 mins ago'}}}}

Answer 2

The data you wanted is come from XHR or Ajax, first you need to extract the ID then use it for getting JSON content. 您想要的数据来自XHR或Ajax，首先您需要提取ID，然后将其用于获取JSON内容。

import requests
from bs4 import BeautifulSoup

url ='https://www.futbin.com/19/player/477/Jordan%20Henderson/'
page = requests.get(url).text
soup = BeautifulSoup(page, 'html.parser')

playerId = soup.find(id="page-info")['data-baseid'] # 183711

jsonURL = url ='https://www.futbin.com/19/playerPrices?player=' + playerId
jsonObj = requests.get(url).json()
# print(jsonObj)

psLowestPrice = jsonObj[playerId]['prices']['ps']['LCPrice']
print(psLowestPrice)

Answer 3

bs4 select gives you a list of matched tags. bs4 select为您提供了匹配标签的列表。
Following your example, what about doing: 按照您的示例，该怎么做：

price1 = soup.find("div", {"class": "bin_price lbin"}).span.contents
price2 = soup.select('#ps-lowest-1')

Access the text inside the first element in the list: 访问列表中第一个元素内的文本：

print(price2[0].text)

Or check all: 或检查所有：

for elem in price2:
  print(elem.text)

使用BeautifulSoup从范围中提取数据

问题描述

3 个解决方案

解决方案1
1 2018-12-13 14:00:39

解决方案2
1 已采纳 2018-12-13 14:14:44

解决方案3
0 2018-12-13 14:07:48

使用BeautifulSoup从范围中提取数据

问题描述

3 个解决方案

解决方案1 1 2018-12-13 14:00:39

解决方案2 1 已采纳 2018-12-13 14:14:44

解决方案3 0 2018-12-13 14:07:48

解决方案1
1 2018-12-13 14:00:39

解决方案2
1 已采纳 2018-12-13 14:14:44

解决方案3
0 2018-12-13 14:07:48