簡體   English   中英

如何使用python解析所有html代碼?

[英]How can I to parse all html code with python?

我需要通過授權從網站解析HTML的一部分。 但是,當我嘗試執行此操作時,我的腳本在這部分找不到所有標簽:

<tbody>              
    <td class="ng-binding">name</td>
    <td class="ng-binding">name</td>
    <td class="ng-binding">name</td>
    <td class="ng-binding">name</td>
    <td></td>
</tr><!-- end ngIf: bsks -->
<!-- ngIf: (bsks | size)>0 --><tr class="bsstr ng-scope" ng-if="(bsks | size)>0">
    <td></td>
    <td></td>
    <td></td>
    <td><b class="ng-binding">сумма</b></td>
    <td></td>
</tr><!-- end ngIf: (bsks | size)>0 -->
<!-- ngIf: (bsks | size) === 0 -->
<!-- ngRepeat: item in bsks | orderBy: date --><!-- ngIf: (bsks | size) > 0 --><tr class="bsstr ng-scope" ng-repeat="item in bsks | orderBy: date" ng-if="(bsks | size) > 0">
    <td>

我是初學者,請幫助我分析cite的這一部分如何獲取所需的所有標簽? 該網站還有另一個授權頁面( url = self.BASE_URL + 'api/v1/login/auth?info=1'

class Auth:
    BASE_URL = 'http.............'

    def auth(self):
        params = {
            'user': u'g1625719',
            'pass': u'472001',
            'from_site': 1,
            'dev': u'16e753be3dc097354e3328e47c3701a9'
        }
        session = requests.Session()
        url = self.BASE_URL + 'api/v1/login/auth?info=1'
        r = session.post(url, params)
        print(r.text)

    def get_url(self):
        url = self.BASE_URL + '#!/line/cart/checklist/'
        print(url)
        response = urllib.request.urlopen(url)
        return response.read()

    def parse(self):
        soup = BeautifulSoup(self.get_url(), 'html.parser')
        table = soup.body.find('div', {'class': 'example-animate-container'})
        print(table)

工作不正確。

嘗試使用find_all( https://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-the-tree

class Auth:
    BASE_URL = 'http.............'

    def auth(self):
        params = {
            'user': u'g1625719',
            'pass': u'472001',
            'from_site': 1,
            'dev': u'16e753be3dc097354e3328e47c3701a9'
        }
        session = requests.Session()
        url = self.BASE_URL + 'api/v1/login/auth?info=1'
        r = session.post(url, params)
        print(r.text)

    def get_url(self):
        url = self.BASE_URL + '#!/line/cart/checklist/'
        print(url)
        response = urllib.request.urlopen(url)
        return response.read()

    def parse(self):
        soup = BeautifulSoup(self.get_url(), 'html.parser')
        table = soup.body.find_all('div', {'class': 'example-animate-container'})
        print(table)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM