使用 python 抓取 crunchbase 数据的网页

Question

Code:代码：

import requests

response= requests.get("https://www.crunchbase.com/search/people/field/organizations/num_employees_enum/anheuser-busch")

response.raise_for_status()

webFile =open('myFile.txt', 'wb')

for chunk in res.iter_content(10000):
    webFile.write(chunk)
    webFile.close()

I found the following error:我发现以下错误：

requests.exceptions.HTTPError: 416 Client Error: Requested Range Not Satisfiable for url: https://www.crunchbase.com/search/people/field/organizations/num_employees_enum/anheuser-busch requests.exceptions.HTTPError：416 客户端错误：请求的范围无法满足 url： https : //www.crunchbase.com/search/people/field/organizations/num_employees_enum/anheuser-busch

Answer 1

If you remove the line response.raise_for_status() you will receive the following output from crunchbase:如果删除response.raise_for_status()您将从 crunchbase 收到以下输出：

Pardon Our Interruption...请原谅我们的打扰...

As you were browsing www.crunchbase.com something about your browser made us think you were a bot.当您浏览www.crunchbase.com 时，您浏览器的某些方面让我们认为您是一个机器人。 There are a few reasons this might happen:发生这种情况的原因有以下几个：

You're a power user moving through this website with super-human speed.您是以超人的速度浏览本网站的高级用户。
You've disabled JavaScript in your web browser.您已在 Web 浏览器中禁用 JavaScript。
A third-party browser plugin, such as Ghostery or NoScript, is preventing JavaScript from running.第三方浏览器插件（例如 Ghostery 或 NoScript）阻止 JavaScript 运行。 Additional information is available in this support article.此支持文章中提供了其他信息。

In fact, you are a bot, instead of Python requests you should try using their own API.事实上，你是一个机器人，你应该尝试使用他们自己的 API，而不是 Python 请求。

EDIT编辑

To use the crunchbase API, you need to register here: https://about.crunchbase.com/solutions/ the free basic access licence should be enough to access organizations according the documentation.要使用 crunchbase API，您需要在此处注册： https ://about.crunchbase.com/solutions/ 免费的基本访问许可证应该足以根据文档访问组织。

Once you have registered you will have a user API key, then you can make your requests as follows:注册后，您将拥有一个用户 API 密钥，然后您可以按如下方式提出请求：

https://api.crunchbase.com/v3.1/organizations?user_key=[user_key] https://api.crunchbase.com/v3.1/organizations?user_key=[user_key]

The equivalent to the query you made using the API would be something like this:相当于您使用 API 进行的查询将是这样的：

import json,requests

url = "https://api.crunchbase.com/v3.1/organizations/anheuser-busch"

params = dict(user_key="your_key")

resp = requests.get(url=url, params=params)
data = json.loads(resp.text)

webFile = open('myFile.txt', 'w')

for organization in data:
    webFile.write(organization["num_employees_max"])

webFile.close()

Haven't tested it myself but it should get you going.还没有自己测试过，但它应该让你去。

Here is all the data available for organizations: https://data.crunchbase.com/docs/organization以下是组织可用的所有数据： https : //data.crunchbase.com/docs/organization

And here is the reference to get started with the API: https://data.crunchbase.com/docs/using-the-api这是开始使用 API 的参考： https : //data.crunchbase.com/docs/using-the-api

使用 python 抓取 crunchbase 数据的网页

问题描述

1 个解决方案

解决方案1
6 2017-11-06 10:30:35

Pardon Our Interruption...请原谅我们的打扰...

使用 python 抓取 crunchbase 数据的网页

问题描述

1 个解决方案

解决方案1 6 2017-11-06 10:30:35

Pardon Our Interruption...请原谅我们的打扰...

解决方案1
6 2017-11-06 10:30:35