python web-crawling，requests.post在服务器环境中不返回任何内容

Question

我正在尝试在网络中抓取 csv 文件。 在网页中下载文件有效。 此外，此代码在本地环境（Windows）中运行良好。 但是，当我在服务器环境（Ubuntu）中执行代码时，它不返回任何内容，如下所示。 我该如何解决这个问题？ 我无法弄清楚是什么问题。

def get_otp(bld, date):
    url="http://marketdata.krx.co.kr/contents/COM/GenerateOTP.jspx"
    header={'Referer': 'http://marketdata.krx.co.kr/mdi', "User-Agent": "Mozilla/5.0",
            "X-Requested-With": "XMLHttpRequest"}
    bld= "MKD/13/1302/13020101/mkd13020101"
    param={"name": "fileDown", "filetype" : "csv", "url": bld, "market_gubun": 'ALL', 
           "sect_tp_cd": "ALL","schdate": date, 
           "pagePath": "/contents/MKD/13/1302/13020101/MKD13020101.jsp"}
    return requests.get(url, headers=header, params=param).text

def get_file(otp):
    url="http://file.krx.co.kr/download.jspx?"
    header={"Origin": "http://marketdata.krx.co.kr",
            'Referer': 'http://marketdata.krx.co.kr/mdi',
            "Upgrade-Insecure-Requests": "1",
            "Host": "file.krx.co.kr",
            "User-Agent": "Mozilla/5.0"}
    param={'code':otp}
    byte_data = requests.post(url, headers=header, data=param)
    data=byte_data
    #df = pd.read_csv(BytesIO(byte_data))
    return data

bld= "MKD/13/1302/13020101/mkd13020101"
otp=get_otp(bld,"20201116")
ret=get_file(otp)
ret.heaers

结果

{'Date': 'Thu, 19 Nov 2020 08:37:01 GMT', 
'Set-Cookie': 'SCOUTER=z368kb97coovj; Expires=Tue, 07-Dec-2088 11:51:08 GMT, JSESSIONID=9AC74CC81C757D3CD656EA4FD0D3A05D.102tomcat4; Path=/; HttpOnly',
 'Content-Length': '0', 
'Content-Type': 'text/html;charset=UTF-8'}

Answer 1

在更改为适用于我的示例的bld和pagePath并将基本 url 从http://marketdata.krx.co.kr更改为https://global.krx.co.kr后，我让它工作了。

def get_otp(bld, page_path, date):
    url="https://global.krx.co.kr/contents/COM/GenerateOTP.jspx"
    header={'Referer': 'https://global.krx.co.kr/mdi', "User-Agent": "Mozilla/5.0",
            "X-Requested-With": "XMLHttpRequest"}    
    param={"name": "fileDown", "filetype" : "csv", "url": bld, "market_gubun": 'ALL', 
           "sect_tp_cd": "ALL","schdate": date, "pagePath": page_path}
    return requests.get(url, headers=header, params=param).text

def get_file(otp):
    url="https://file.krx.co.kr/download.jspx?"
    header={"Origin": "https://global.krx.co.kr",
            'Referer': 'https://global.krx.co.kr/mdi',
            "Upgrade-Insecure-Requests": "1",
            "Host": "file.krx.co.kr",
            "User-Agent": "Mozilla/5.0"}
    param={'code':otp}
    data = requests.post(url, headers=header, data=param).text
    return data

bld = "GLB/05/0503/0503050600/glb0503050600"
page_path = "/contents/GLB/05/0503/0503050600/GLB0503050600.jsp"
date = "20210714"

otp = get_otp(bld, page_path, date)
data = get_file(otp)

python web-crawling，requests.post在服务器环境中不返回任何内容

问题描述

1 个解决方案

解决方案1
0 2021-07-14 07:20:48

python web-crawling，requests.post在服务器环境中不返回任何内容

问题描述

1 个解决方案

解决方案1 0 2021-07-14 07:20:48

解决方案1
0 2021-07-14 07:20:48