[英]How can I to parse all html code with python?
I need to parse a part of HTML from a site with authorization. 我需要通过授权从网站解析HTML的一部分。 But when I try to do it, my script can't find all tags this part : 但是,当我尝试执行此操作时,我的脚本在这部分找不到所有标签:
<tbody>
<td class="ng-binding">name</td>
<td class="ng-binding">name</td>
<td class="ng-binding">name</td>
<td class="ng-binding">name</td>
<td></td>
</tr><!-- end ngIf: bsks -->
<!-- ngIf: (bsks | size)>0 --><tr class="bsstr ng-scope" ng-if="(bsks | size)>0">
<td></td>
<td></td>
<td></td>
<td><b class="ng-binding">сумма</b></td>
<td></td>
</tr><!-- end ngIf: (bsks | size)>0 -->
<!-- ngIf: (bsks | size) === 0 -->
<!-- ngRepeat: item in bsks | orderBy: date --><!-- ngIf: (bsks | size) > 0 --><tr class="bsstr ng-scope" ng-repeat="item in bsks | orderBy: date" ng-if="(bsks | size) > 0">
<td>
I am a beginner , please help me to parse this part of cite How can I get all tags that I need? 我是初学者,请帮助我分析cite的这一部分如何获取所需的所有标签? The site has another page for authorization ( url = self.BASE_URL + 'api/v1/login/auth?info=1'
) 该网站还有另一个授权页面( url = self.BASE_URL + 'api/v1/login/auth?info=1'
)
class Auth:
BASE_URL = 'http.............'
def auth(self):
params = {
'user': u'g1625719',
'pass': u'472001',
'from_site': 1,
'dev': u'16e753be3dc097354e3328e47c3701a9'
}
session = requests.Session()
url = self.BASE_URL + 'api/v1/login/auth?info=1'
r = session.post(url, params)
print(r.text)
def get_url(self):
url = self.BASE_URL + '#!/line/cart/checklist/'
print(url)
response = urllib.request.urlopen(url)
return response.read()
def parse(self):
soup = BeautifulSoup(self.get_url(), 'html.parser')
table = soup.body.find('div', {'class': 'example-animate-container'})
print(table)
It is work incorrect. 工作不正确。
Try using find_all ( https://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-the-tree ) 尝试使用find_all( https://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-the-tree )
class Auth:
BASE_URL = 'http.............'
def auth(self):
params = {
'user': u'g1625719',
'pass': u'472001',
'from_site': 1,
'dev': u'16e753be3dc097354e3328e47c3701a9'
}
session = requests.Session()
url = self.BASE_URL + 'api/v1/login/auth?info=1'
r = session.post(url, params)
print(r.text)
def get_url(self):
url = self.BASE_URL + '#!/line/cart/checklist/'
print(url)
response = urllib.request.urlopen(url)
return response.read()
def parse(self):
soup = BeautifulSoup(self.get_url(), 'html.parser')
table = soup.body.find_all('div', {'class': 'example-animate-container'})
print(table)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.