如何在python美麗湯中的大括號內提取值？

Question

到目前為止，這是我的代碼：

s = BS(r.content, 'lxml')
findDiv = s.find('div', {'id':'BodyContentPlaceholder_T00DF23F8005_Col00'})
findTable = findDiv.findAll('table', {'style':'width: 100%; border-collapse: collapse;'})

for table in findTable:
        rows = table.findAll('tr')
        rows = rows[1:]
        for row in rows:
            cells = row.findAll('td')
            extension = cells[3].button.input.text
            print('docs page ' + str(extension))
            #extention = re.compile(pattern, text)
            docpage_url_ending = cells[3].find('value')
            print('url ' + str(docpage_url_ending))

我正在嘗試從中獲取navigationUrl文本

<input id="ctl00_BodyContentPlaceholder_C013_RadListView1_ctrl1_rlb1_ClientState" name="ctl00_BodyContentPlaceholder_C013_RadListView1_ctrl1_rlb1_ClientState" type="hidden" autocomplete="off" value="{&quot;text&quot;:&quot;Documents&quot;,&quot;value&quot;:&quot;&quot;,&quot;target&quot;:&quot;&quot;,&quot;navigateUrl&quot;:&quot;/procurement/procurement-bids/constructionprocurementdetail?Title=Lakeland Adult Daycare Center Roof Replacement 18-785&quot;,&quot;primary&quot;:false}">

Answer 1

import json
import bs4
import urllib

data = '<input id="ctl00_BodyContentPlaceholder_C013_RadListView1_ctrl1_rlb1_ClientState" name="ctl00_BodyContentPlaceholder_C013_RadListView1_ctrl1_rlb1_ClientState" type="hidden" autocomplete="off" value="{&quot;text&quot;:&quot;Documents&quot;,&quot;value&quot;:&quot;&quot;,&quot;target&quot;:&quot;&quot;,&quot;navigateUrl&quot;:&quot;/procurement/procurement-bids/constructionprocurementdetail?Title=Lakeland Adult Daycare Center Roof Replacement 18-785&quot;,&quot;primary&quot;:false}">'

# cook up some soup
soup = bs4.BeautifulSoup(data)
# extract the relevant attribute
vals_as_string = soup.html.body.input.attrs['value']
# it's urlencoded, so decode it
unquoted_vals_as_string = urllib.parse.unquote(vals_as_string)
# turns out, it's json
vals_as_json = json.loads(unquoted_vals_as_string)
# well, json converts to dict, so there's our target
navigateUrl = vals_as_json['navigateUrl']

Answer 2

這就是您進入navigateUrl ：

import json
from bs4 import BeautifulSoup as BS

s = BS(r.content, 'lxml')
findDiv = s.find('div', {'id':'BodyContentPlaceholder_T00DF23F8005_Col00'})
findTable = findDiv.findAll('table', {'style':'width: 100%; border-collapse: collapse;'})

for table in findTable:
    rows = table.findAll('tr')
    rows = rows[1:]
    for row in rows:
        cells = row.findAll('td')
        extension = cells[3].button.input.text
        print('docs page ' + str(extension))
        #extention = re.compile(pattern, text)
        docpage_url_ending = json.loads(cells[3].button.input.attr['value'])['navigateUrl']
        print('url ' + docpage_url_ending)

如何在python美麗湯中的大括號內提取值？

問題描述

2 個解決方案

解決方案1
1 2018-06-26 21:44:45

解決方案2
0 2018-06-26 21:25:42

如何在python美麗湯中的大括號內提取值？

問題描述

2 個解決方案

解決方案1 1 2018-06-26 21:44:45

解決方案2 0 2018-06-26 21:25:42

解決方案1
1 2018-06-26 21:44:45

解決方案2
0 2018-06-26 21:25:42