[英]python web-crawling, requests.post does not return any content in server environment
我正在嘗試在網絡中抓取 csv 文件。 在網頁中下載文件有效。 此外,此代碼在本地環境(Windows)中運行良好。 但是,當我在服務器環境(Ubuntu)中執行代碼時,它不返回任何內容,如下所示。 我該如何解決這個問題? 我無法弄清楚是什么問題。
def get_otp(bld, date):
url="http://marketdata.krx.co.kr/contents/COM/GenerateOTP.jspx"
header={'Referer': 'http://marketdata.krx.co.kr/mdi', "User-Agent": "Mozilla/5.0",
"X-Requested-With": "XMLHttpRequest"}
bld= "MKD/13/1302/13020101/mkd13020101"
param={"name": "fileDown", "filetype" : "csv", "url": bld, "market_gubun": 'ALL',
"sect_tp_cd": "ALL","schdate": date,
"pagePath": "/contents/MKD/13/1302/13020101/MKD13020101.jsp"}
return requests.get(url, headers=header, params=param).text
def get_file(otp):
url="http://file.krx.co.kr/download.jspx?"
header={"Origin": "http://marketdata.krx.co.kr",
'Referer': 'http://marketdata.krx.co.kr/mdi',
"Upgrade-Insecure-Requests": "1",
"Host": "file.krx.co.kr",
"User-Agent": "Mozilla/5.0"}
param={'code':otp}
byte_data = requests.post(url, headers=header, data=param)
data=byte_data
#df = pd.read_csv(BytesIO(byte_data))
return data
bld= "MKD/13/1302/13020101/mkd13020101"
otp=get_otp(bld,"20201116")
ret=get_file(otp)
ret.heaers
結果
{'Date': 'Thu, 19 Nov 2020 08:37:01 GMT',
'Set-Cookie': 'SCOUTER=z368kb97coovj; Expires=Tue, 07-Dec-2088 11:51:08 GMT, JSESSIONID=9AC74CC81C757D3CD656EA4FD0D3A05D.102tomcat4; Path=/; HttpOnly',
'Content-Length': '0',
'Content-Type': 'text/html;charset=UTF-8'}
在更改為適用於我的示例的bld
和pagePath
並將基本 url 從http://marketdata.krx.co.kr
更改為https://global.krx.co.kr
后,我讓它工作了。
def get_otp(bld, page_path, date):
url="https://global.krx.co.kr/contents/COM/GenerateOTP.jspx"
header={'Referer': 'https://global.krx.co.kr/mdi', "User-Agent": "Mozilla/5.0",
"X-Requested-With": "XMLHttpRequest"}
param={"name": "fileDown", "filetype" : "csv", "url": bld, "market_gubun": 'ALL',
"sect_tp_cd": "ALL","schdate": date, "pagePath": page_path}
return requests.get(url, headers=header, params=param).text
def get_file(otp):
url="https://file.krx.co.kr/download.jspx?"
header={"Origin": "https://global.krx.co.kr",
'Referer': 'https://global.krx.co.kr/mdi',
"Upgrade-Insecure-Requests": "1",
"Host": "file.krx.co.kr",
"User-Agent": "Mozilla/5.0"}
param={'code':otp}
data = requests.post(url, headers=header, data=param).text
return data
bld = "GLB/05/0503/0503050600/glb0503050600"
page_path = "/contents/GLB/05/0503/0503050600/GLB0503050600.jsp"
date = "20210714"
otp = get_otp(bld, page_path, date)
data = get_file(otp)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.