简体   繁体   中英

i wanted to scrape the text inside the h2 tag containing img tag and ordinary the text with beautifulsoup

here is the the part i want to scrape, just the address

<h4>
            <img alt="icon" src="/assets/assets/img/agent-result/8fd82ccea302741620de4526126aa8d1-map-marker.png" title="icon"/>
            Plot 9, Olawale Onitiri Cole street,lekki phase 1, Lagos
           </h4>

here is my code

response = requests.get('https://www.propertypro.ng/agents',headers={"user-agent":"mozilla/5.0(x11; ubuntu; Linux x86_64;rv:61.0)Gecko/20100101 Firefox/61.0"})
data= BeautifulSoup(response.content,'html.parser')
page_one = data.find_all('div',class_='agent-rp-area')
for item in page_one:
    for link in item.find("h4"):
         page_one_.append(link.string)

and this is the error am having its giving me the image tag and the text

Problem is that you use find() in wrong way.

find gives only one element (or None ) and you shouldn't use it with for -loop but

 link = item.find("h4")

If you use for -loop then it doesn't get value from find() but from find().children and it has problem to get text from one of children.


Second problem is that you should use rather .text or .get_text() / .get_text(strip=True) instead of .string

 print(link.text)

import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "mozilla/5.0(x11; ubuntu; Linux x86_64;rv:61.0)Gecko/20100101 Firefox/61.0"
}

url = 'https://www.propertypro.ng/agents'

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')

page_one = soup.find_all('div', class_='agent-rp-area')

for item in page_one:
    link = item.find("h4")
    if link:
        text = link.text.strip() 
        print('text:', text)

Result:

text: NO. 62, KUDIRAT ABIOLA ROAD, IKEJA, Lagos
text: Plot 9, Olawale Onitiri Cole street,lekki phase 1, Lagos
text: 
text: 1 towry close off idejo street victoria island, Lagos
text: 13 Akpomudje Street, Ago Palace Way, Okota, Lagos
text: C193 Ikota Shopping Complex VGC, Lagos
text: Lagos
text: 34, AJOSE STREET OFF JUBRIL MARTIN'S STREET LAWANSON. , Lagos
text: 
text: Obafemi Awolowo Way, Lagos
text: 
text: Suite 230, Block C, Road 2, Ikota Shopping Complex, VGC, Lekki, Lagos, Lagos
text: 46 liasu road , Lagos
text: 
text: 23 Furo Ezimora Street,Lekki Phase 1, Lagos
text: 
text: 120/12 Bosun Adekoya Street, Lekki Phase 1 (Oceanside), Lagos, Lagos
text: 
text: 15, TAIWO KOYA STREET, OFF SURA MOGAJI, Lagos
text: Lekki Expressway, Lagos

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM