简体   繁体   中英

bs4 unable to find div with specific class using id

div

So for improving my scraping skills, I have been trying to download the document present in https://ikeacatalogues.ikea.com/sv-1950/page/1 but when ever I am trying to get div either with or without id, all I am getting is <div id="fakescroll"</div> and what I want is the direct link to the document which is present in an anchor tag

target link

I am not able to access it either. I tired to find all the link present in the webpage and it is returning an empty list.

Please help. Here is my code. This code return empty output:

from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://ikeacatalogues.ikea.com/sv-1950/page/1"
browser = webdriver.Chrome(executable_path="/path/to/chromedriver.exe")
browser.get(url)
soup = BeautifulSoup(browser.page_source,"html.parser")
items=soup.select(""div", {"id": "main_menu"}")
print(items)

Here is my code for getting all the href. The output is empty.

import httplib2
from bs4 import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('https://ikeacatalogues.ikea.com/sv-1950/page/1')

for link in BeautifulSoup(response, parse_only=SoupStrainer('a')):
    if link.has_attr('href'):
        print(link['href'])

The images/texts are embedded within the page inside the <script> tags, so BeautifulSoup doesn't see them. You can use re / json modules to decode it. For example:

import re
import json
import requests


url = "https://ikeacatalogues.ikea.com/sv-1950/page/1"

text = requests.get(url).text
data = re.search(r"var data = (\{.*\});", text)
data = json.loads(data.group(1))

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for s in data["spreads"]:
    for p in s["pages"]:
        print(p["text"])
        print("https://ikeacatalogues.ikea.com" + p["images"]["at2400"])
        print("#" * 80)

Prints:


...

Samma fåtölj med lös plymå
Nr 11 B Samma som ovanstående, men försedd med lös plymå, vilket är mycket popu­
lärt och samtidigt lätt att rengöra. Plymån är av bästa resårkvalitet med avsydda kan­
ter. Samma pris som nr 11.
Nr 11/1 Samma som nr 11, men u ta n • nackkudde och i något mindre utförande. Ty g ­
åtgång 1,6 meter.
Pris pr styck komplett med tyg
.................................................................................... 8 4 . 5 0
Pris pr styck utan tyg
..................................................................................................... 71.50
AB Tryckericentralen i lore*


https://ikeacatalogues.ikea.com/77436/1101577/pages/bbe136e8-1317-474e-ba07-c48a4ded045e-at2400.jpg
################################################################################

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM