[英]Python beautifulsoup printing the element desired
After I find the desired element I have this: 找到所需元素后,我得到了:
[<div class="statsValue">$1,615,422</div>, <div class="statsValue">1</div>, <div class="statsValue">2</div>]
I would like to get just the number $1,615,422
. 我只想得到
$1,615,422
。 How do I do this? 我该怎么做呢? I could not find anything useful online.
我找不到任何有用的在线信息。
Here is my code: 这是我的代码:
from selenium import webdriver
from selenium.webdriver.remote import webelement
import pandas as pd
import time
from bs4 import BeautifulSoup
driver = webdriver.Chrome('chromedriver.exe')
driver.get('https://www.redfin.com/')
search_box = driver.find_element_by_name('searchInputBox')
search_box.send_keys('693 Bluebird Canyon Drive, Laguna Beach, CA 92651')
search_box.submit()
time.sleep(2)
def get_address_url(address):
url_list = []
search_box = driver.find_element_by_name('searchInputBox')
search_box.send_keys('693 Bluebird Canyon Drive, Laguna Beach, CA 92651')
search_box.submit()
time.sleep(2)
url_list.append(driver.current_url)
# element = driver.find_elements_by_class_name('statsValue')
# print(element[0].get_attribute('innerHTML'))
soup = BeautifulSoup(driver.page_source, 'html.parser')
data = soup.find_all(lambda tag: tag.name == 'div' and tag.get('class') == ['statsValue'])
print(data)
print(len(data))
print(type(data))
driver.quit()
You want the text
attribute. 您需要
text
属性。
data = soup.find_all(lambda tag: tag.name == 'div' and tag.get('class') == ['statsValue'])
for element in data:
print (element.text)
You can use regex and sub
to remove not digits: 您可以使用正则表达式和
sub
删除数字:
import re
price = re.sub("[^0-9]", "", "$1,615,422")
print(price)
Result: 结果:
1615422
1615422
soup = BeautifulSoup(driver.page_source, 'html.parser')
stats = soup.select(".statsValue")
for s in stats:
print(s.text)
If you only want to get the number $1,615,422
, I think requests
is enough. 如果您只想获取
$1,615,422
,我认为requests
就足够了。
hope this code will help you 希望这段代码对您有帮助
import requests
from bs4 import BeautifulSoup as Soup
url = 'https://www.redfin.com/CA/Laguna-Beach/693-Bluebird-Canyon-Dr-92651/home/4894466'
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36'
}
r = requests.get(url, headers=headers)
soup = Soup(r.text, 'html5lib')
data = soup.find('div', {'class', 'avm'}).div.text
print(data) # $1,615,422
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.