简体   繁体   English

python web-crawling,requests.post在服务器环境中不返回任何内容

[英]python web-crawling, requests.post does not return any content in server environment

I'm trying to crawling the csv file in web.我正在尝试在网络中抓取 csv 文件。 Downloading file in web page works.在网页中下载文件有效。 Also, this code works well in local environment(Windows).此外,此代码在本地环境(Windows)中运行良好。 However, when I execute the code in the sever environment(Ubuntu) it returns no contents as the result below shows.但是,当我在服务器环境(Ubuntu)中执行代码时,它不返回任何内容,如下所示。 How can I solve this?我该如何解决这个问题? I can't figure out what is the problem.我无法弄清楚是什么问题。

def get_otp(bld, date):
    url="http://marketdata.krx.co.kr/contents/COM/GenerateOTP.jspx"
    header={'Referer': 'http://marketdata.krx.co.kr/mdi', "User-Agent": "Mozilla/5.0",
            "X-Requested-With": "XMLHttpRequest"}
    bld= "MKD/13/1302/13020101/mkd13020101"
    param={"name": "fileDown", "filetype" : "csv", "url": bld, "market_gubun": 'ALL', 
           "sect_tp_cd": "ALL","schdate": date, 
           "pagePath": "/contents/MKD/13/1302/13020101/MKD13020101.jsp"}
    return requests.get(url, headers=header, params=param).text

def get_file(otp):
    url="http://file.krx.co.kr/download.jspx?"
    header={"Origin": "http://marketdata.krx.co.kr",
            'Referer': 'http://marketdata.krx.co.kr/mdi',
            "Upgrade-Insecure-Requests": "1",
            "Host": "file.krx.co.kr",
            "User-Agent": "Mozilla/5.0"}
    param={'code':otp}
    byte_data = requests.post(url, headers=header, data=param)
    data=byte_data
    #df = pd.read_csv(BytesIO(byte_data))
    return data
bld= "MKD/13/1302/13020101/mkd13020101"
otp=get_otp(bld,"20201116")
ret=get_file(otp)
ret.heaers

Result结果

{'Date': 'Thu, 19 Nov 2020 08:37:01 GMT', 
'Set-Cookie': 'SCOUTER=z368kb97coovj; Expires=Tue, 07-Dec-2088 11:51:08 GMT, JSESSIONID=9AC74CC81C757D3CD656EA4FD0D3A05D.102tomcat4; Path=/; HttpOnly',
 'Content-Length': '0', 
'Content-Type': 'text/html;charset=UTF-8'}

I got it to work after changing to the bld and pagePath applicable to my example, and changing the base url from http://marketdata.krx.co.kr to https://global.krx.co.kr .在更改为适用于我的示例的bldpagePath并将基本 url 从http://marketdata.krx.co.kr更改为https://global.krx.co.kr后,我让它工作了。

def get_otp(bld, page_path, date):
    url="https://global.krx.co.kr/contents/COM/GenerateOTP.jspx"
    header={'Referer': 'https://global.krx.co.kr/mdi', "User-Agent": "Mozilla/5.0",
            "X-Requested-With": "XMLHttpRequest"}    
    param={"name": "fileDown", "filetype" : "csv", "url": bld, "market_gubun": 'ALL', 
           "sect_tp_cd": "ALL","schdate": date, "pagePath": page_path}
    return requests.get(url, headers=header, params=param).text

def get_file(otp):
    url="https://file.krx.co.kr/download.jspx?"
    header={"Origin": "https://global.krx.co.kr",
            'Referer': 'https://global.krx.co.kr/mdi',
            "Upgrade-Insecure-Requests": "1",
            "Host": "file.krx.co.kr",
            "User-Agent": "Mozilla/5.0"}
    param={'code':otp}
    data = requests.post(url, headers=header, data=param).text
    return data

bld = "GLB/05/0503/0503050600/glb0503050600"
page_path = "/contents/GLB/05/0503/0503050600/GLB0503050600.jsp"
date = "20210714"

otp = get_otp(bld, page_path, date)
data = get_file(otp)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM