简体   繁体   English

Python:使用BeatifulSoup从数据绑定中获取数据

[英]Python: Getting data from databound using BeatifulSoup

This is an expand from a question that I posted a week ago ( getting text from html using beatifulsoup ). 这是我一周前发布的一个问题的扩展( 使用beatifulsoup从html获取文本 )。 It seems that most of the data that I want to extract is data-bind and is not 'stored' when i use soup.findAll . 似乎我要提取的大多数数据都是data-bind并且在我使用soup.findAll时并未“存储”。 For example taking this link: kaggle/user/results I am trying to get the name of all the competitions the user participated. 例如,使用以下链接: kaggle / user / results我试图获取用户参加的所有比赛的名称。 I am using the following code: 我正在使用以下代码:

url = 'https://www.kaggle.com/titericz/results'
sourceCode = requests.get(url)
plainText = sourceCode.text
soup = BeautifulSoup(plainText)
for link in soup.findAll('tr'):
    print(link)

So i take the first competition but in the link it seems that the values of name of competition, position in this competition, total competitors etc. are missing while in the html are there. 因此,我参加了第一场比赛,但是在link似乎缺少html中存在的比赛名称,在该比赛中的位置,总竞争者等的值。 Tried to follow the same procedure with the answer of the question that I link above, but I could not manage it(by using re.compile and pattern.search ). 试图按照与上面我所链接问题的答案相同的步骤进行操作,但是我无法对其进行管理(通过使用re.compilepattern.search )。 Is there a way to accomplish it by using BeatifulSoup ? 有没有办法通过使用BeatifulSoup来完成它? I couldnt find any similar issue on the web. 我在网络上找不到任何类似的问题。

You can parse the underlying get request, which returns a json string. 您可以解析基础的get请求,该请求返回json字符串。

here's a small script which will get you started. 这是一个小脚本,可以帮助您入门。

import requests
import json

jsonResponse = requests.get("https://www.kaggle.com/knockout/profiles/54836/results")
data = json.loads(jsonResponse.text)
print(data)

for eachData in data:
    print("competition name:", eachData["competition"]["title"])
    print("Rank:", eachData["rank"])
    print("competitors count:", eachData["teamCount"])

the output will be of the format: 输出将具有以下格式:

 competition name: Digit Recognizer 
 Rank: None
 competitors count: 933
 competition name: The Allen AI Science Challenge 
 Rank: 110 
 competitors count: 486

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM