简体   繁体   English

如何使用python解析所有html代码?

[英]How can I to parse all html code with python?

I need to parse a part of HTML from a site with authorization. 我需要通过授权从网站解析HTML的一部分。 But when I try to do it, my script can't find all tags this part : 但是,当我尝试执行此操作时,我的脚本在这部分找不到所有标签:

<tbody>              
    <td class="ng-binding">name</td>
    <td class="ng-binding">name</td>
    <td class="ng-binding">name</td>
    <td class="ng-binding">name</td>
    <td></td>
</tr><!-- end ngIf: bsks -->
<!-- ngIf: (bsks | size)>0 --><tr class="bsstr ng-scope" ng-if="(bsks | size)>0">
    <td></td>
    <td></td>
    <td></td>
    <td><b class="ng-binding">сумма</b></td>
    <td></td>
</tr><!-- end ngIf: (bsks | size)>0 -->
<!-- ngIf: (bsks | size) === 0 -->
<!-- ngRepeat: item in bsks | orderBy: date --><!-- ngIf: (bsks | size) > 0 --><tr class="bsstr ng-scope" ng-repeat="item in bsks | orderBy: date" ng-if="(bsks | size) > 0">
    <td>

I am a beginner , please help me to parse this part of cite How can I get all tags that I need? 我是初学者,请帮助我分析cite的这一部分如何获取所需的所有标签? The site has another page for authorization ( url = self.BASE_URL + 'api/v1/login/auth?info=1' ) 该网站还有另一个授权页面( url = self.BASE_URL + 'api/v1/login/auth?info=1'

class Auth:
    BASE_URL = 'http.............'

    def auth(self):
        params = {
            'user': u'g1625719',
            'pass': u'472001',
            'from_site': 1,
            'dev': u'16e753be3dc097354e3328e47c3701a9'
        }
        session = requests.Session()
        url = self.BASE_URL + 'api/v1/login/auth?info=1'
        r = session.post(url, params)
        print(r.text)

    def get_url(self):
        url = self.BASE_URL + '#!/line/cart/checklist/'
        print(url)
        response = urllib.request.urlopen(url)
        return response.read()

    def parse(self):
        soup = BeautifulSoup(self.get_url(), 'html.parser')
        table = soup.body.find('div', {'class': 'example-animate-container'})
        print(table)

It is work incorrect. 工作不正确。

Try using find_all ( https://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-the-tree ) 尝试使用find_all( https://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-the-tree

class Auth:
    BASE_URL = 'http.............'

    def auth(self):
        params = {
            'user': u'g1625719',
            'pass': u'472001',
            'from_site': 1,
            'dev': u'16e753be3dc097354e3328e47c3701a9'
        }
        session = requests.Session()
        url = self.BASE_URL + 'api/v1/login/auth?info=1'
        r = session.post(url, params)
        print(r.text)

    def get_url(self):
        url = self.BASE_URL + '#!/line/cart/checklist/'
        print(url)
        response = urllib.request.urlopen(url)
        return response.read()

    def parse(self):
        soup = BeautifulSoup(self.get_url(), 'html.parser')
        table = soup.body.find_all('div', {'class': 'example-animate-container'})
        print(table)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM