简体   繁体   English

在Python中,如何从动态加载的网站请求特定数据?

[英]In Python, how can I request specific data from a dynamically loaded website?

I want to load pages from PeoplePerHour.com into python to run some data analysis, but it keeps getting data from a page I didn't ask for, I think it must go to the main page and then refreshes somehow into the page I ask for. 我想将PeoplePerHour.com中的页面加载到python中以运行一些数据分析,但是它不断从我不需要的页面中获取数据,我认为它必须转到主页,然后以某种方式刷新到我所请求的页面中对于。

For example: I want to pull the prices from all users at http://www.peopleperhour.com/freelance/data+analyst , and the data spans over multiple pages. 例如:我想从http://www.peopleperhour.com/freelance/data+analyst向所有用户收取价格,数据跨越多个页面。

Say I want to request page 2, http://www.peopleperhour.com/freelance/data+analyst#page=2 . 假设我要请求第2页, http://www.peopleperhour.com/freelance/data+analyst#page=2 If I go here in a browser, it works fine and pulls up page 2, but I think it pulls up page one first and then "refreshes" into page 2 (I think). 如果我在浏览器中转到此处,它可以正常工作并拉起第2页,但我认为它先拉起第1页,然后“刷新”到第2页(我认为)。 If I access this in python, it loads the HTML from the first page, and never sees page 2. 如果我在python中访问它,它将从第一页加载HTML,而从不看到第2页。

Here's my code: 这是我的代码:

import requests
from pattern import web
import re
import pandas as pd

def list_of_prices(url):
    html = requests.get(url).text
    dom = web.DOM(html)
    list = []
    for person in dom('.freelancer-list-item .medium.price-tag'):
        currency = person('sup')
        amount = person('span')
        list.append([currency[0].content if currency else 'na', amount[0].content if amount else 'na'])
    return list

list_of_prices('http://www.peopleperhour.com/freelance/data+analyst#page=2')

No matter what, this returns the prices from page 1. 无论如何,这将返回第1页的价格。

What is going on that I'm just not seeing? 我没有看到发生了什么事?

If I understand correctly, you want to iterate through the pages. 如果我理解正确,则需要遍历页面。 If that's the case, I believe the problem is with your URL. 如果是这样,我认为问题出在您的URL。

Here's the URL you gave: http://www.peopleperhour.com/freelance/data+analyst#page=2 这是您提供的URL: http : //www.peopleperhour.com/freelance/data+analyst#page=2

The problem is, "page" is not a bookmark on that page. 问题是,“页面”不是该页面上的书签。 When you use the #page=2, it tells the browser to go down to the same page for a bookmark called "page=2". 当您使用#page = 2时,它告诉浏览器进入同一页面以找到名为“ page = 2”的书签。

Here's the URL for the Next button in that site: http://www.peopleperhour.com/freelance/data+analyst?sort=most-relevant&page=2 这是该站点中“下一步”按钮的URL: http : //www.peopleperhour.com/freelance/data+analyst?sort=most-relevant&page=2

You'll see it says "&page=2" which means something else. 您会看到它说“&page = 2”,表示其他含义。 In their code "page" is a variable being passed via the url, with a value of 2. You use the "&" if there are more than one of these variables. 在他们的代码中,“ page”是一个通过url传递的变量,其值为2。如果这些变量中有多个,则使用“&”。 Also, you are missing a "?" 另外,您缺少“?” symbol. 符号。 If you're passing variables via the URL, you have to put a ? 如果要通过URL传递变量,则必须输入?。 followed by the name=value pairs for your variables. 然后是变量的名称=值对。

So, easy fix, change your url to this: 因此,轻松解决,将您的网址更改为此:

http://www.peopleperhour.com/freelance/data+analyst?page=2

That's in comparison to your old url: 与您的旧网址相比:

http://www.peopleperhour.com/freelance/data+analyst#page=2

As a quick test, copy/paste the corrected url on your web browser. 作为快速测试,请在Web浏览器上复制/粘贴正确的URL。 You will see it now is on page 2. 现在您将在第2页上看到它。

Getting dynamic content (those generated by client-side code) is always very tricky. 获取动态内容(由客户端代码生成的内容)总是非常棘手的。 There is no easy solution to this, but if you really want to dig into it, I recommend PyV8 , a JavaScript engine in Python . 没有一个简单的解决方案,但是如果您真的想深入研究它,我建议PyV8 JavaScript PythonJavaScript引擎PyV8

Error in pattern when using pattern3 in python 3.6 在python 3.6中使用pattern3时出现模式错误

Please click on the above Hyperlink to open the Image What is the alternative to executing the same code under python3.6 environment because due to this I have to install the pattern3, the pattern is not supported by the python 3.6 Thanks! 请单击上面的超级链接以打开图像。在python3.6环境下执行相同代码的替代方法是什么,因为因此,我必须安装pattern3,python 3.6不支持该模式。谢谢!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何查看从脚本动态加载的数据? - How can I see data loaded from a script dynamically? 如何从使用PHP的网站获取特定条目? - How can I get an specific entry from a website with PHP? 如果图像标签是从数据库动态创建的,如何使用javascript或php确定何时加载图像? - how can I use javascript or php to determine when images are loaded if image tags are dynamically created from database? ajax请求进行期间,我可以查看加载了哪些数据吗? - can i see what data is loaded while an ajax request is in progress? 如何在动态加载的导航栏中使用PHP Session变量? - How can I use a PHP Session variable in a dynamically loaded navbar? 如何生成动态加载的数据库的所有可能结果? - how can I generate all possible outcomes of a dynamically loaded database? 如何使用php从网站上的表格中抓取数据 - How can I scrape data from a table on a website using php 我如何在我的PC中使用php从其他网站动态自动自动将许多csv文件下载到localhost - How can i download many csv file dynamically automatic from other website to localhost in my pc in php 如何使用PHP从网站下载特定类型的所有文件? - How can I download all files of a specific type from a website using PHP? 如何使用PHP向来自特定国家/地区的网站访问者显示不同的内容? - How can I show different content to website visitors from a specific country in PHP?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM