Cannot find HTML elements with BeautifulSoup Python

Question

I found a really nice code on the https://towardsdatascience.com/ website for web scraping and I'm trying to implement for my own use.

https://ingatlan.com/lista/elado+lakas+ii-ker?page=1 this is a hungarian real estate website. Firstly, I just want to grab the prices of the real estates but if I run my code I don't get any results, the number of items found is 0.

import urllib.request,sys,time
from bs4 import BeautifulSoup
import requests
import pandas as pd

pagesToGet= 1

upperframe=[]  
for page in range(1,pagesToGet+1):
    print('processing page :', page)
    url = 'https://ingatlan.com/lista/elado+lakas+ii-ker?page='+str(page)
    print(url)
    
    
    try:
        page=requests.get(url)                            
    
    except Exception as e:                                   
        error_type, error_obj, error_info = sys.exc_info()     
        print ('ERROR FOR LINK:',url)                          
        print (error_type, 'Line:', error_info.tb_lineno)     
        continue                                              
    time.sleep(2)   
    soup=BeautifulSoup(page.text,'html.parser')
    frame=[]
    links=soup.find_all('div',attrs={'class':'listing js-listing '})
    print(len(links))
    filename="NEWS.csv"
    f=open(filename,"w", encoding = 'utf-8')
    headers="Price\n"
    f.write(headers)
    
for j in links:
        Price = j.find("div",attrs={'class':'price'})
        frame.append((Price))
        upperframe.extend(frame)
f.close()
data=pd.DataFrame(upperframe, columns=['Price'])
data.head()

What can I ruin? There have been sites where it works, such as Myprotein, but there are places where it does not.

Answer 1

Here only the price has been taken as you only asked that

without the User-Agent it give 403 error forbidden

import requests
from bs4 import BeautifulSoup
import pandas as pd

start_url="https://ingatlan.com/lista/elado+lakas+ii-ker?page=1"
page_data=requests.get(start_url, headers={'User-Agent': 'XYZ/3.0'})
soup=BeautifulSoup(page_data.content,"html.parser")

#for i in soup:  #i was first just checking http staus here 
    #print(i)    #without useragent i got 403 as response
    #print()
    
Price=[]

for job_tag in soup.find_all("div",class_="resultspage__content"):
    for job_tag2 in job_tag.find_all("div",class_="listing js-listing"):
        for job_tag3 in job_tag2.find_all("div",class_="price__container js-has-sqm-price-info-tooltip"): 


            price=job_tag3.find("div",class_="price")
            Price.append(price.text.strip())
            #print(Price)

data=pd.DataFrame(Price,columns=["price"])
print(data)

output of pandas DataFrame

         price
0    31.5 M Ft
1    77.9 M Ft
2      62 M Ft
3   129.5 M Ft
4     125 M Ft
5    95.9 M Ft
6    46.9 M Ft
7    45.9 M Ft
8    59.9 M Ft
9     109 M Ft
10     48 M Ft
11     87 M Ft

Cannot find HTML elements with BeautifulSoup Python

Question

1 answers

solution1
0 ACCPTED 2020-09-19 08:51:58

Cannot find HTML elements with BeautifulSoup Python

Question

1 answers

solution1 0 ACCPTED 2020-09-19 08:51:58

solution1
0 ACCPTED 2020-09-19 08:51:58