在Python中，如何从动态加载的网站请求特定数据？

Question

I want to load pages from PeoplePerHour.com into python to run some data analysis, but it keeps getting data from a page I didn't ask for, I think it must go to the main page and then refreshes somehow into the page I ask for. 我想将PeoplePerHour.com中的页面加载到python中以运行一些数据分析，但是它不断从我不需要的页面中获取数据，我认为它必须转到主页，然后以某种方式刷新到我所请求的页面中对于。

For example: I want to pull the prices from all users at http://www.peopleperhour.com/freelance/data+analyst , and the data spans over multiple pages. 例如：我想从http://www.peopleperhour.com/freelance/data+analyst向所有用户收取价格，数据跨越多个页面。

Say I want to request page 2, http://www.peopleperhour.com/freelance/data+analyst#page=2 . 假设我要请求第2页， http://www.peopleperhour.com/freelance/data+analyst#page=2 。 If I go here in a browser, it works fine and pulls up page 2, but I think it pulls up page one first and then "refreshes" into page 2 (I think). 如果我在浏览器中转到此处，它可以正常工作并拉起第2页，但我认为它先拉起第1页，然后“刷新”到第2页（我认为）。 If I access this in python, it loads the HTML from the first page, and never sees page 2. 如果我在python中访问它，它将从第一页加载HTML，而从不看到第2页。

Here's my code: 这是我的代码：

import requests
from pattern import web
import re
import pandas as pd

def list_of_prices(url):
    html = requests.get(url).text
    dom = web.DOM(html)
    list = []
    for person in dom('.freelancer-list-item .medium.price-tag'):
        currency = person('sup')
        amount = person('span')
        list.append([currency[0].content if currency else 'na', amount[0].content if amount else 'na'])
    return list

list_of_prices('http://www.peopleperhour.com/freelance/data+analyst#page=2')

No matter what, this returns the prices from page 1. 无论如何，这将返回第1页的价格。

What is going on that I'm just not seeing? 我没有看到发生了什么事？

Answer 1

If I understand correctly, you want to iterate through the pages. 如果我理解正确，则需要遍历页面。 If that's the case, I believe the problem is with your URL. 如果是这样，我认为问题出在您的URL。

Here's the URL you gave: http://www.peopleperhour.com/freelance/data+analyst#page=2 这是您提供的URL： http : //www.peopleperhour.com/freelance/data+analyst#page=2

The problem is, "page" is not a bookmark on that page. 问题是，“页面”不是该页面上的书签。 When you use the #page=2, it tells the browser to go down to the same page for a bookmark called "page=2". 当您使用＃page = 2时，它告诉浏览器进入同一页面以找到名为“ page = 2”的书签。

Here's the URL for the Next button in that site: http://www.peopleperhour.com/freelance/data+analyst?sort=most-relevant&page=2 这是该站点中“下一步”按钮的URL： http : //www.peopleperhour.com/freelance/data+analyst?sort=most-relevant&page=2

You'll see it says "&page=2" which means something else. 您会看到它说“＆page = 2”，表示其他含义。 In their code "page" is a variable being passed via the url, with a value of 2. You use the "&" if there are more than one of these variables. 在他们的代码中，“ page”是一个通过url传递的变量，其值为2。如果这些变量中有多个，则使用“＆”。 Also, you are missing a "?" 另外，您缺少“？” symbol. 符号。 If you're passing variables via the URL, you have to put a ? 如果要通过URL传递变量，则必须输入？。 followed by the name=value pairs for your variables. 然后是变量的名称=值对。

So, easy fix, change your url to this: 因此，轻松解决，将您的网址更改为此：

http://www.peopleperhour.com/freelance/data+analyst?page=2

That's in comparison to your old url: 与您的旧网址相比：

http://www.peopleperhour.com/freelance/data+analyst#page=2

As a quick test, copy/paste the corrected url on your web browser. 作为快速测试，请在Web浏览器上复制/粘贴正确的URL。 You will see it now is on page 2. 现在您将在第2页上看到它。

Answer 2

Getting dynamic content (those generated by client-side code) is always very tricky. 获取动态内容（由客户端代码生成的内容）总是非常棘手的。 There is no easy solution to this, but if you really want to dig into it, I recommend PyV8 , a JavaScript engine in Python . 没有一个简单的解决方案，但是如果您真的想深入研究它，我建议PyV8 JavaScript Python的JavaScript引擎PyV8 。

Answer 3

Error in pattern when using pattern3 in python 3.6 在python 3.6中使用pattern3时出现模式错误

Please click on the above Hyperlink to open the Image What is the alternative to executing the same code under python3.6 environment because due to this I have to install the pattern3, the pattern is not supported by the python 3.6 Thanks! 请单击上面的超级链接以打开图像。在python3.6环境下执行相同代码的替代方法是什么，因为因此，我必须安装pattern3，python 3.6不支持该模式。谢谢！

在Python中，如何从动态加载的网站请求特定数据？

问题描述

3 个解决方案

解决方案1
3 已采纳 2014-07-18 03:33:46

解决方案2
1 2014-07-18 03:34:30

解决方案3
1 2017-12-11 13:16:43

在Python中，如何从动态加载的网站请求特定数据？

问题描述

3 个解决方案

解决方案1 3 已采纳 2014-07-18 03:33:46

解决方案2 1 2014-07-18 03:34:30

解决方案3 1 2017-12-11 13:16:43

解决方案1
3 已采纳 2014-07-18 03:33:46

解决方案2
1 2014-07-18 03:34:30

解决方案3
1 2017-12-11 13:16:43