简体   繁体   中英

POST request (python) - invalid request

I'm trying to use the API of a media ID registry, the EIDR, to download tv show information. I'd like to be able to query many shows automatically. I'm not experienced in how to use APIs, and the documentation for this specific one is very opaque. I'm using python 3 (requests library) in Ubuntu 16.04.

I tried sending a request for a specific tv show. I took the headers and parameters information from the browser, as in, I did the query from the browser (I looked up 'anderson cooper 360' from this page ) and then looked at the information in the "network" tab of the browser's page inspector. I used the following code:

import requests
url = 'https://resolve.eidr.org/EIDR/query/'

headers = {'Host': 'ui.eidr.org', 'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:58.0) \ 
Gecko/20100101 Firefox/58.0', \ 
'Accept': '*/*', \ 
'Accept-Language': 'en-US,en;q=0.5', 'Accept-Encoding': 'gzip, deflate, br', \ 
'Referer': 'https://ui.eidr.org/search/results/19c70c63d73790b86f3fb385f2a9b3f4', \ 
'Cookie': 'ci_session=f4tnbi8qm7oaq30agjtn8p69j91s4li4; \ 
_ga=GA1.2.1738620664.1519337357; _gid=GA1.2.1368695940.1519337357; _gat=1', \ 
'Connection': 'keep-alive'}

params = {'search_page_size':25, 'CustomAsciiSearch[_v]':1, \ 
'search_type':'content', 'ResourceName[_v]':'anderson cooper 360', \ 
'AlternateResourceNameAddition[_v]':1, \ 
'AssociatedOrgAlternateNameAddition[_v]':1, 'Status[_v]':'valid'}

r = requests.post(url, data=params, headers=headers)
print(r.text)

I get this response that basically says it's an invalid request:

<?xml version="1.0" encoding="UTF-8"?><Response \ 
xmlns="http://www.eidr.org/schema" version="2.1.0"><Status>\ 
<Code>3</Code><Type>invalid request</Type></Status></Response>

Now, I read in an answer to this Stackoverflow question that I should somehow use a session object. The code suggested in the answer by Padraic Cunningham was this:

headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:46.0) \ 
Gecko/20100101 Firefox/46.0','X-Requested-With': 'XMLHttpRequest', \ 
"referer": "https://www.hackerearth.com/challenges/"}

with requests.Session() as s:
    s.get("https://www.hackerearth.com")
    headers["X-CSRFToken"] = s.cookies["csrftoken"]
    r = s.post("https://www.hackerearth.com/AJAX/filter-challenges/?modern=true", \
    headers=headers, files={'submit': (None, 'True')})
    print(r.json())

So I understand that I should somehow use this, but I don't fully understand why or how.

So my question(s) would be:

1) What does 'invalid request' mean in this case?

2) Do you have any suggestions for how to write the request in a way that I can iterate it many times for different items I want to look up?

3) Do you know what I should do to properly use a session object here?

Thank you!

you probably need this documentation .

1) from the documentation:

invalid request : An API (URI) that does not exist including missing a required parameter. May also include an incorrect HTTP operation on a valid URI (such as a GET on a registration). Could also be POST multipart data that is syntactically invalid such as missing required headers or if the end-of-line characters are not CR-LF.

2) As far as I understand, this API accepts XML requests. See what appears after clicking on 'View XML' on the web page with results ( https://ui.eidr.org/search/results ). For the 'anderson cooper 360' you can use the XML data in Python like this:

import requests
import xml.etree.ElementTree as ET

url = 'https://resolve.eidr.org/EIDR/query/'

headers =  {'Content-Type': 'text/xml',
    'Authorization': 'Eidr 10.5238/webui:10.5237/D4C9-7E59:9kDMO4+lpsZGUIl8doWMdw==',
    'EIDR-Version': '2.1'}
xml_query = """<Request xmlns="http://www.eidr.org/schema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <Operation>
            <Query>
                    <Expression><![CDATA[ASCII(((/FullMetadata/BaseObjectData/ResourceName "anderson" AND /FullMetadata/BaseObjectData/ResourceName "cooper" AND /FullMetadata/BaseObjectData/ResourceName "360") OR (/FullMetadata/BaseObjectData/AlternateResourceName "anderson" AND /FullMetadata/BaseObjectData/AlternateResourceName "cooper" AND /FullMetadata/BaseObjectData/AlternateResourceName "360")) AND /FullMetadata/BaseObjectData/Status "valid")]]></Expression>
                    <PageNumber>1</PageNumber>
                    <PageSize>25</PageSize>
            </Query>
    </Operation>
</Request>"""
r = requests.post(url, data=xml_query, headers=headers)
root = ET.fromstring(r.text)
for sm in root.findall('.//{http://www.eidr.org/schema}SimpleMetadata'):
    print({ch.tag.replace('{http://www.eidr.org/schema}',''):ch.text for ch in sm.getchildren()})

3) I don't think you need the session object.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM