简体   繁体   English

POST请求包括使用python的文件

[英]POST request including file using python

I am a beginner in web programming, so sorry if this is a very basic thing, but couldn't find something as specific as the problem that I have in stackoverflow.我是网络编程的初学者,很抱歉,如果这是一个非常基本的东西,但找不到像我在 stackoverflow 中遇到的问题那样具体的东西。 So I have a lot of text files (10k) that I need to upload to this website https://rostlab.org/services/nlsdb/ and then click on "Evaluate NES/NLS".所以我有很多文本文件(10k)需要上传到这个网站https://rostlab.org/services/nlsdb/然后点击“Evaluate NES/NLS”。 This triggers a SQL query and returns me some info in table form back.这会触发一个 SQL 查询并以表格形式返回给我一些信息。 I then need to click "CSV" button to get the file downloaded into my computer.然后我需要单击“CSV”按钮将文件下载到我的计算机中。 Of course I don't want to upload each file manually, so I was trying to generate the requests in Python but could't get it done, I didn't even get to the point of have the table response from the initial website, so downloading the CSV is a challenge that I haven't met yet:当然我不想手动上传每个文件,所以我试图在 Python 中生成请求但无法完成,我什至没有达到从初始网站获得表格响应的地步,所以下载 CSV 是一个我还没有遇到的挑战:

import requests

url = 'https://rostlab.org/services/nlsdb/query'
files = {'file-upload': ('some.txt', open('C:\\some.txt', 'rb'), 'text/plain')}
data = {'_token':'', 'input-data':'', 'query-sig2':''}

r = requests.post('https://rostlab.org/services/nlsdb/query', files=files, data=data)

As response, I am getting a shitload of text which I can resume in an error code 500 from HTML, so I am definetly doing something wrong here I can't see what.作为回应,我得到了一大堆文本,我可以从 HTML 以错误代码 500 恢复这些文本,所以我在这里肯定做错了我看不到什么。 The POST request from the website when I submit the file looks something like this:当我提交文件时,来自网站的 POST 请求如下所示:

**General**
  Request URL:https://rostlab.org/services/nlsdb/query
  Request Method:POST
  Status Code:200 OK
  Remote Address:131.159.28.73:443
  Referrer Policy:no-referrer-when-downgrade

Response Headers
  Cache-Control:no-cache, private
  Connection:Keep-Alive
  Content-Encoding:gzip
  Content-Length:2231
  Content-Type:text/html; charset=UTF-8
  Date:Thu, 08 Feb 2018 12:39:30 GMT
  Keep-Alive:timeout=5, max=100
  Server:Apache
  Set-Cookie:nlsdb_session=eyJpdiI6IjZMRk03ZjRCNjBmU1JcL3Y0Vko4ZHFRPT0iLCJ2YWx1ZSI6Ikh2bHcyZHBuN25nNmx1QnRoOFlPMWhWU0RYdUpEdnAwbGtySWgwbDlDVElHZmRyNlBMeEdXT3ROSERcLzRRNDB2ZnVUQ2oyTDlmOVRHa3JNUUZJTnBkUT09IiwibWFjIjoiZWM3ZjFjYmQ2ZThkNmRlM2JmOTY5OWZiYWMxOTA4ZmZiZjcxZjU1ODJjNjU1ODgzYjczMmUxMGY1NGMwMjNlMCJ9; expires=Thu, 08-Feb-2018 14:39:30 GMT; Max-Age=7200; path=/; httponly
  Set-Cookie:XSRF-TOKEN=eyJpdiI6IjExMjBaRHNmWHVLZTBzSURYZFwvUmF3PT0iLCJ2YWx1ZSI6InQyWUE5QzZEd2xmZU5rMjlyekV1Z2JcL3lGNkNvbHl1TnBHMVh5eWtLeWtNb3JHcTJJSFpyR0lDVkxNV2h2cGsrTUhYMGl3ZDBET0hucHdpNzV0YkRpdz09IiwibWFjIjoiNzcxODBhYjIzYjEzNDU1OTNhNGRhNjI3OTAxNWY1MjFkYjI5MWQ5NjgwNGE4ZjVmMzQzZThkNWUzZWE0YTgwYSJ9; expires=Thu, 08-Feb-2018 14:39:30 GMT; Max-Age=7200; path=/
  Vary:Accept-Encoding

Request Headers
 Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
  Accept-Encoding:gzip, deflate, br
  Accept-Language:en-US,en;q=0.9
  Cache-Control:max-age=0
  Connection:keep-alive
  Content-Length:1943
  Content-Type:multipart/form-data; boundary=----WebKitFormBoundary1tOuJdyWl1bn7H4X
  Cookie:XSRF-TOKEN=eyJpdiI6IjZaWHdTa3FPYmNHbkxsNVpoUlE3T0E9PSIsInZhbHVlIjoiQWMraGlLekd1akkrc0RDTzNMRGNIcVFkVGdBNjZFa2h4XC8xcUI0VmtIVG9CTnVPNW1IUW55NU9iNGlGY0NCWkFkd0hDZnJOaXBaT3J0VHZTSXl6b1FBPT0iLCJtYWMiOiJmMjE3N2JkZDIyMjRkNTY3ZGE4MDhlNGY5OWJiMDAwYjNiNzYyNGJjMTc2YzA4NTQwODcxZTM3YjI0YjQ5MWUyIn0%3D; nlsdb_session=eyJpdiI6IjByb2dtS0Q1ekFBU1F0WURJUk8rWnc9PSIsInZhbHVlIjoiM3lMNFU5Y2hBXC9BVU0xT0RUNnhVaUJ0ckJ0RnB5QlJqbk15alNSNkM4MjhNTGd6TFwvR0dwd0ZpWE9pU3piekhWb3ZzQjNZYVQ4ODdHeUxUMVJWM0pwUT09IiwibWFjIjoiYTE1Y2Q2NmRlN2M4Yjc1MzEyZTQxYjcwMzVmYjNiNjA1YjdiNjU4ODkxZWJhM2JmYTAwYTk1MWNhZWNkNTczMiJ9
  DNT:1
  Host:rostlab.org
  Origin:https://rostlab.org
  Referer:https://rostlab.org/services/nlsdb/
  Upgrade-Insecure-Requests:1
  User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36

Request Payload  
  ------WebKitFormBoundary1tOuJdyWl1bn7H4X
  Content-Disposition: form-data; name="_token"

  GnjGT2Ejrrpo4Nlf2EbwtmLtY29GNFnoTJpl5z5o
  ------WebKitFormBoundary1tOuJdyWl1bn7H4X
  Content-Disposition: form-data; name="input-data"


  ------WebKitFormBoundary1tOuJdyWl1bn7H4X
  Content-Disposition: form-data; name="file-upload"; filename="some.txt"
  Content-Type: text/plain


  ------WebKitFormBoundary1tOuJdyWl1bn7H4X
  Content-Disposition: form-data; name="query-sig2"

  sF4MZkIaMc1K9TPZ6uYJuQ
  ------WebKitFormBoundary1tOuJdyWl1bn7H4X--

I believe that the data object is not correct, but I was not able to make it right, and omitting it does not seem to work neither.我相信数据对象不正确,但我无法正确处理,省略它似乎也不起作用。 Any suggestion on how to retrieve the data correctly, and then downloading the corresponding csv file?关于如何正确检索数据,然后下载相应的 csv 文件的任何建议?

The site uses cross-site scripting tokens to protect against a common class of attacks.该站点使用跨站点脚本令牌来防止常见类型的攻击。 Furthermore, they use generated tokens for their submit buttons as well.此外,他们还将生成的令牌用于提交按钮。

To be able to post anything, you need to:为了能够发布任何内容,您需要:

  • Store and return cookies.存储和返回 cookie。 This is easiest with a session object这对会话对象来说是最简单的
  • Load the form page and read out the CSRF token and submit button values加载表单页面并读出 CSRF 令牌并提交按钮值
  • Use the extracted tokens in your POST request在 POST 请求中使用提取的令牌

I'd use BeautifulSoup to parse the form page and extract the tokens:我会使用BeautifulSoup来解析表单页面并提取令牌:

from bs4 import BeautifulSoup
import requests

form_url = 'https://rostlab.org/services/nlsdb/'

with requests.session() as sess:
    response = sess.get(form_url)
    soup = BeautifulSoup(response.content, 'html.parser')

    csrf_token = soup.find('input', {'name': '_token'})['value']
    submit_token = soup.find('button', id='submit-sig2')['value']        
    action_url = soup.find('form', id='input-form')['action']

    data = {'_token': csrf_token, 'query-sig2': submit_token, 'input-data':''}

    with open('C:\\some.txt', 'rb') as some_text:
        files = {'file-upload': ('some.txt', some_text, 'text/plain')}
        response = sess.post(action_url, data=data, files=files)

Note that I also extract the action attribute of the form tag;请注意,我还提取了form标签的action属性; best to stick to what the server tells us to use.最好坚持服务器告诉我们使用的内容。

The above code produces a 200 OK response with a HTML page listing matching results in a table.上面的代码生成一个 200 OK 响应,其中包含一个 HTML 页面,其中列出了表格中的匹配结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM