简体   繁体   English

Python POST 请求返回 404 状态代码,但 GET 请求返回 200

[英]Python POST Request returns 404 status code but GET request returns 200

I am trying to login to this website.我正在尝试登录网站。 When I submit a simple GET request I get a 200 response as expected.当我提交一个简单的 GET 请求时,我得到了预期的 200 响应。

import requests

login_url = 'https://urs.earthdata.nasa.gov/'

sess = requests.Session()
sess.headers ={'User-Agent':'Mozilla/5.0'}
sess.verify=False
r1 = sess.get(login_url)
print (r1.status_code)
>>>200

However when I try to actually log in using a POST request I get a 404 error.但是,当我尝试使用 POST 请求实际登录时,出现 404 错误。 (The payload for the request has been sourced from the html for the page and using page inspectors in Firefox.) (请求的有效负载来自页面的 html 并使用 Firefox 中的页面检查器。)

import requests
import re

product_url = 'https://datapool.asf.alaska.edu/L1.5/A3/ALPSRP171431190-L1.5.zip'
login_url = 'https://urs.earthdata.nasa.gov/'
username = 'username'
password = 'password'

sess = requests.Session()
sess.headers ={'User-Agent':'Mozilla/5.0'}
sess.verify=False

r1 = sess.get(product_url)

tkn_ptn = '<meta name="csrf-token" content="(.*==)" />'
tkn = re.search(tkn_ptn,r1.text).group(1)
print('CSRF Token: {}'.format(tkn))
>>>'CSRF Token: CDOX5tOhBtX2vvZn/c/MLRaYJtW7hzeQLm/eEVn09cHosnlsR/5P8a+k4YEaAzYQZRxCgNf9evDqyhWiZiefmQ=='

cli_ptn = '<input type="hidden" name="client_id" id="client_id" value="(.*)" />'
cli = re.search(cli_ptn,r1.text).group(1)
print('Client ID: {}'.format(cli))
>>>'Client ID: BO_n7nTIlMljdvU6kRRB3g'

redir_ptn = '<input type="hidden" name="redirect_uri" id="redirect_uri" value="(.*?)" />'
redir = re.search(redir_ptn,r1.text).group(1)
print ('Redirect URL: {}'.format(redir))
>>>'Redirect URL: https://auth.asf.alaska.edu/login'

payload = {'username':username, 
            'password':password,
            'authenticity_token' : tkn,
            'client_id' : cli, 
            'redirect_uri': redir,
            'response_type' : 'code', 
            'stay_in' : '1', 
            'commit':'Log in'}

r2 = sess.post(login_url, data=payload)
print (r2.status_code)
>>>404

Why won't the page accept my payload and let me login?为什么页面不接受我的有效负载并让我登录?

The payload data comes from the login page itself, so that would be https://urs.earthdata.nasa.gov/ looking at network tab on your browser.有效载荷数据来自登录页面本身,因此在浏览器上查看网络选项卡时将是https://urs.earthdata.nasa.gov/
I just entered some random username and password and looking at my network tab i see a POST being made to https://urs.earthdata.nasa.gov/login .我刚刚输入了一些随机的用户名和密码,然后查看我的网络选项卡,我看到一个 POST 正在发送到https://urs.earthdata.nasa.gov/login Looking at the payload, this is the format it has:查看有效负载,这是它的格式:

utf8: ✓
authenticity_token: ...token base64...
username: 123
password: 123
client_id: 
redirect_uri: 
commit: Log in

So we just need to extract the authenticity_token from the source.所以我们只需要从源中提取authenticity_token looking at the source for the login page we see this bit:查看登录页面的源代码,我们看到了这一点:

<form id="login" action="/login" accept-charset="UTF-8" method="post"><input name="utf8" type="hidden" value="&#x2713;" /><input type="hidden" name="authenticity_token" value="...token base64..." />

So we just use some regex to extract this bit as it's faster for single-use stuff like this(you could use any other method or regex form you want):所以我们只使用一些正则表达式来提取这一点,因为它对于像这样的一次性使用更快(你可以使用任何其他方法或你想要的正则表达式形式):

token = re.search(r'authenticity_token".*?"(.*?)"', webpage.text).group(1)

and finally creating the data and passing it to the POST method:最后创建数据并将其传递给 POST 方法:

data = {
"utf8": "✓",
"authenticity_token": token,
"username": username,
"password": password,
"client_id": "",
"redirect_uri": "",
"commit": "Log in",
}
login = requests.post("https://urs.earthdata.nasa.gov/login", headers={'User-Agent':'Mozilla/5.0'}, data=data)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM