简体   繁体   English


[英]beautifulsoup and request.post

I practice scraping one site. 我练习刮一个网站。 I got some mysterious situation. 我有一些神秘的情况。

import requests
from bs4 import BeautifulSoup
import json

class n_auction(object):
    def __init__(self):
        self.search_request = {

        self.headers = {'User-Agent':'Mozilla/5.0',

    def scrape(self, max_pages):

        addr = []

        pageno = 0
        self.search_request['start'] = pageno
        while pageno < max_pages:
            payload = json.dumps(self.search_request)
            r = requests.post('http://goodauction.land.naver.com/auction/ax_list.php', data=payload ,headers=self.headers)

            s = BeautifulSoup(r.text)

if __name__ == '__main__':
    scraper = n_auction()

when I print(r.text), I got full text.like below picture. 当我打印(r.text)时,我得到了全文。如下图所示。 在此输入图像描述

But after passing through beautifulsoup, I lost some values like below picture. 但经过beautifulsoup后,我失去了一些价值,如下图所示。 在此输入图像描述

It's very embarrassing. 这很尴尬。 Help me~~ 帮帮我吧~~

Switching the parser from the default, lxml , to html.parser worked for me. 将解析器从默认的lxml切换到html.parser为我工作。

Try: s = BeautifulSoup(r.text, 'html.parser') 尝试: s = BeautifulSoup(r.text, 'html.parser')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM