![](/img/trans.png)
[英]How to extract content inside a label using beautiful soup in Python?
[英]How can I extract the value inside the curly brackets in python beautiful soup?
到目前為止,這是我的代碼:
s = BS(r.content, 'lxml')
findDiv = s.find('div', {'id':'BodyContentPlaceholder_T00DF23F8005_Col00'})
findTable = findDiv.findAll('table', {'style':'width: 100%; border-collapse: collapse;'})
for table in findTable:
rows = table.findAll('tr')
rows = rows[1:]
for row in rows:
cells = row.findAll('td')
extension = cells[3].button.input.text
print('docs page ' + str(extension))
#extention = re.compile(pattern, text)
docpage_url_ending = cells[3].find('value')
print('url ' + str(docpage_url_ending))
我正在嘗試從中獲取navigationUrl文本
<input id="ctl00_BodyContentPlaceholder_C013_RadListView1_ctrl1_rlb1_ClientState" name="ctl00_BodyContentPlaceholder_C013_RadListView1_ctrl1_rlb1_ClientState" type="hidden" autocomplete="off" value="{"text":"Documents","value":"","target":"","navigateUrl":"/procurement/procurement-bids/constructionprocurementdetail?Title=Lakeland Adult Daycare Center Roof Replacement 18-785","primary":false}">
import json
import bs4
import urllib
data = '<input id="ctl00_BodyContentPlaceholder_C013_RadListView1_ctrl1_rlb1_ClientState" name="ctl00_BodyContentPlaceholder_C013_RadListView1_ctrl1_rlb1_ClientState" type="hidden" autocomplete="off" value="{"text":"Documents","value":"","target":"","navigateUrl":"/procurement/procurement-bids/constructionprocurementdetail?Title=Lakeland Adult Daycare Center Roof Replacement 18-785","primary":false}">'
# cook up some soup
soup = bs4.BeautifulSoup(data)
# extract the relevant attribute
vals_as_string = soup.html.body.input.attrs['value']
# it's urlencoded, so decode it
unquoted_vals_as_string = urllib.parse.unquote(vals_as_string)
# turns out, it's json
vals_as_json = json.loads(unquoted_vals_as_string)
# well, json converts to dict, so there's our target
navigateUrl = vals_as_json['navigateUrl']
這就是您進入navigateUrl
:
import json
from bs4 import BeautifulSoup as BS
s = BS(r.content, 'lxml')
findDiv = s.find('div', {'id':'BodyContentPlaceholder_T00DF23F8005_Col00'})
findTable = findDiv.findAll('table', {'style':'width: 100%; border-collapse: collapse;'})
for table in findTable:
rows = table.findAll('tr')
rows = rows[1:]
for row in rows:
cells = row.findAll('td')
extension = cells[3].button.input.text
print('docs page ' + str(extension))
#extention = re.compile(pattern, text)
docpage_url_ending = json.loads(cells[3].button.input.attr['value'])['navigateUrl']
print('url ' + docpage_url_ending)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.