簡體   English   中英

我正在嘗試提取span_id內的文本,但使用python beautifulsoup得到空白輸出

[英]I am trying to extract text inside span_id, but getting blank output using python beautifulsoup

我正在嘗試提取span-id標記內的文本,但輸出屏幕空白。

我也嘗試過使用父元素div文本,但提取失敗,請有人幫助我。 下面是我的代碼。

 import requests
 from bs4 import BeautifulSoup

 r = requests.get('https://www.paperplatemakingmachines.com/')
 soup = BeautifulSoup(r.text,'lxml')
 mob = soup.find('span',{"id":"tollfree"})
 print(mob.text)

我想要該跨度內的文字,該文字是手機號碼。

您必須使用Selenium,因為初始請求中不存在該文本,或者至少沒有搜索<script>標記就沒有該文本。

from bs4 import BeautifulSoup as soup
from selenium import webdriver
import time

driver = webdriver.Chrome('C:\chromedriver_win32\chromedriver.exe')

url='https://www.paperplatemakingmachines.com/'
driver.get(url)

# It's better to use Selenium's WebDriverWait, but I'm still learning how to use that correctly
time.sleep(5)

soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.close()

mob = soup.find('span',{"id":"tollfree"})
print(mob.text)

數據實際上是通過腳本動態發送的。 您需要做的是從腳本中解析數據:

import requests
import re
from bs4 import BeautifulSoup

r = requests.get('https://www.paperplatemakingmachines.com/')
soup = BeautifulSoup(r.text,'lxml')
script= soup.find('script')
mob = re.search("(?<=pns_no = \")(.*)(?=\";)", script.text).group()
print(mob)

使用正則表達式查找數字的另一種方法

import requests
import re
from bs4 import BeautifulSoup as bs

r = requests.get('https://www.paperplatemakingmachines.com/',)
soup = bs(r.content, 'lxml')
r = re.compile(r'var pns_no = "(\d+)"')
data = soup.find('script', text=r).text
script = r.findall(data)[0]
print('+91-' + script)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM