简体   繁体   English

使用 CKAN API 和 Python 请求库创建 CKAN 数据集

[英]Create CKAN dataset using CKAN API and Python Requests library

I am using CKAN version 2.2 and am trying to automate dataset creation and resource upload.我正在使用 CKAN 2.2 版并尝试自动创建数据集和上传资源。 I seem to be unable to create a dataset using the python requests library.我似乎无法使用 python requests库创建数据集。 I am receiving 400 error code.我收到 400 错误代码。 Code:代码:

import requests, json

dataset_dict = {
    'name': 'testdataset',
    'notes': 'A long description of my dataset',
}

d_url = 'https://mywebsite.ca/api/action/package_create'
auth = {'Authorization': 'myKeyHere'}
f = [('upload', file('PathToMyFile'))]

r = requests.post(d_url, data=dataset_dict, headers=auth)

Strangely I am able to create a new resource and upload a file using the python requests library.奇怪的是,能够创建一个新的资源,并使用Python请求库上传文件。 The code is based on this documentation.该代码基于此文档。 Code:代码:

import requests, json

res_dict = {
    'package_id':'testpackage',
    'name': 'testresource',
    'description': 'A long description of my resource!',
    'format':'CSV'
}

res_url = 'https://mywebsite.ca/api/action/resource_create'
auth = {'Authorization': 'myKey'}
f = [('upload', file('pathToMyFile'))]

r = requests.post(res_url, data=res_dict, headers=auth, files=f)

I am also able to create a dataset using the method in the CKAN documentation using built in python libraries.我还可以使用内置的 Python 库使用 CKAN 文档中的方法创建数据集。 Documentation: CKAN 2.2文档: CKAN 2.2

Code:代码:

#!/usr/bin/env python
import urllib2
import urllib
import json
import pprint

# Put the details of the dataset we're going to create into a dict.
dataset_dict = {
    'name': 'test1',
    'notes': 'A long description of my dataset',
}

# Use the json module to dump the dictionary to a string for posting.
data_string = urllib.quote(json.dumps(dataset_dict))

# We'll use the package_create function to create a new dataset.
request = urllib2.Request('https://myserver.ca/api/action/package_create')

# Creating a dataset requires an authorization header.
request.add_header('Authorization', 'myKey')

# Make the HTTP request.
response = urllib2.urlopen(request, data_string)
assert response.code == 200

# Use the json module to load CKAN's response into a dictionary.
response_dict = json.loads(response.read())
assert response_dict['success'] is True

# package_create returns the created package as its result.
created_package = response_dict['result']
pprint.pprint(created_package)

I am not really sure why my method of creating the dataset is not working.我不确定为什么我创建数据集的方法不起作用。 The documentation for package_create and resource_create functions is very similar and I would expect to be able to use the same technique. package_create 和 resource_create 函数的文档非常相似,我希望能够使用相同的技术。 I would prefer to use the requests package for all my dealings with CKAN.我更愿意使用 requests 包来处理我与 CKAN 的所有交易。 Has anyone been able to create a dataset with the requests library successfully?有没有人能够成功地使用请求库创建数据集?

Any help is greatly appreciated.任何帮助是极大的赞赏。

I finally came back to this and figured it out.我终于回到了这一点并弄清楚了。 Alice's suggestion to check the encoding was very close.爱丽丝检查编码的建议非常接近。 While requests does do the encoding for you, it also decides on its own which type of encoding is appropriate depending on the inputs.虽然requests确实为您进行编码,但它也会根据输入自行决定哪种编码类型是合适的。 If a file is passed in along with a JSON dictionary, requests automatically does multipart/form-data encoding which is accepted by CKAN therefore the request is successful.如果文件与 JSON 字典一起传入,则请求会自动执行 CKAN 接受的多部分/表单数据编码,因此请求成功。

However if we pass only a JSON dictionary the default encoding is form encoding.但是,如果我们传递一个 JSON 字典,则默认编码是表单编码。 CKAN needs requests without files to be URL encoded (application/x-www-form-urlencoded). CKAN 需要对没有文件的请求进行 URL 编码(application/x-www-form-urlencoded)。 To prevent requests from doing any encoding we can pass our parameters in as a string then requests will perform only a POST.为了防止请求进行任何编码,我们可以将参数作为字符串传入,然后请求将仅执行 POST。 This means we have to URL encode it ourselves.这意味着我们必须自己对它进行 URL 编码。

Therefore if I specify the content type, convert the parameters to a string and encode with urllib and then pass the parameter to requests:因此,如果我指定内容类型,则将参数转换为字符串并使用 urllib 进行编码,然后将参数传递给请求:

head['Content-Type'] = 'application/x-www-form-urlencoded'
in_dict = urllib.quote(json.dumps(in_dict))
r = requests.post(url, data=in_dict, headers=head)

Then the request is successful.然后请求成功。

The data you send must be JSON encoded.您发送的数据必须是 JSON 编码的。

From the documentation (the page you linked to):从文档(您链接到的页面):

To call the CKAN API, post a JSON dictionary in an HTTP POST request to one of CKAN's API URLs.要调用 CKAN API,请将 HTTP POST 请求中的 JSON 字典发布到 CKAN 的 API URL 之一。

In the urllib example this is performed by the following line of code:在 urllib 示例中,这是由以下代码行执行的:

data_string = urllib.quote(json.dumps(dataset_dict))

I think (though you should check) that the requests library will do the quoting for you - so you just need to convert your dict to JSON.我认为(虽然你应该检查) requests库会为你做引用 - 所以你只需要将你的 dict 转换为 JSON。 Something like this should work:这样的事情应该工作:

r = requests.post(d_url, data=json.dumps(dataset_dict), headers=auth)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM