I am trying to get information from this website https://www.realtypro.co.za/property_detail.php?ref=1736
I have this table from which I want to take the number of bedrooms
<div class="panel panel-primary">
<div class="panel-heading">Property Details</div>
<div class="panel-body">
<table width="100%" cellpadding="0" cellspacing="0" border="0" class="table table-striped table-condensed table-tweak">
<tbody><tr>
<td class="xh-highlight">3</td><td style="width: 140px" class="">Bedrooms</td>
</tr>
<tr>
<td>Bathrooms</td>
<td>3</td>
</tr>
I am using this xpath expression:
bedrooms = response.xpath("//div[@class='panel panel-primary']/div[@class='panel-body']/table[@class='table table-striped table-condensed table-tweak']/tbody/tr[1]/td[2]/text()").extract_first()
However, I only get 'None' as output.
I have tried several combinations and I only get None as output. Any suggestions on what I am doing wrong?
Thanks in advance!
I would use bs4 4.7.1. where you can search with :contains
for the td
cell having the text "Bedrooms"
then take the adjacent sibling td
. You can add a test for is None
for error handling. Less fragile than long xpath.
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://www.realtypro.co.za/property_detail.php?ref=1736')
soup = bs(r.content, 'lxml')
print(int(soup.select_one('td:contains(Bedrooms) + td').text)
If position was fixed you could use
.table-tweak td + td
Try this and let me know if it works:
import lxml.html
response = [your code above]
beds = lxml.html.fromstring(response)
bedrooms = beds.xpath("//div[@class='panel panel-primary']/div[@class='panel-body']/table[@class='table table-striped table-condensed table-tweak']/tbody/tr[1]/td[2]//preceding-sibling::*/text()")
bedrooms
Output:
['3']
EDIT:
Or possibly:
for bed in beds:
num_rooms = bed.xpath("//div[@class='panel panel-primary']/div[@class='panel-body']/table[@class='table table-striped table-condensed table-tweak']/tbody/tr[1]/td[2]//preceding-sibling::*/text()")
print(num_rooms)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.