I am trying to scrape this site:
https://www.lanebryant.com/perfect-sleeve-swing-tunic-top/prd-356831#color/0000009320
I want to get type of clothing, ie the category of the clothing. There is a script on the page:
How can I collect this text and get the category of the clothing which I have highlighted in the image? I have tried the following code but it returns nothing.
type = d.find_element_by_xpath("//script[@type='text/javascript']").text
print("hiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii"+type)
d here is the driver
Here you go...
1.Get the innerHTML
of the scripts tag
2.Convert into Json()
format
3.use the parameter
and then get the value tops
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import json
driver = webdriver.Chrome()
driver.get('https://www.lanebryant.com/perfect-sleeve-swing-tunic-top/prd-356831')
item = WebDriverWait(driver, 10).until(EC.presence_of_element_located(
(By.XPATH, "//script[@type='text/javascript'][contains(.,'window.lanebryantDLLite')]"))).get_attribute('innerHTML')
itemtext = item.split("=")[1].split(";")[0] # This will return as string
itemjson = json.loads(itemtext.strip()) # Converted here into json format
itemtop = itemjson['page']['pageName'] # Use the parameter to get the text
print(itemtop.split(':')[1].strip()) # Split here to get only value tops
Hope this helps.
try something like this,
type = d.find_element_by_xpath('//script[@type="text/javascript"]').text
Also make a count of script tags in the page source.
One of the problems with your current way is that you collect all scripts on the current page, you need to narrow it a bit.
This finds the correct script and then collects the category with the help of regex:
from lxml import html
import requests
import re
# create the regex
category_regex = re.compile(r'(?<="category": ").*(?=", "CategoryID")')
page = requests.get('https://www.lanebryant.com/perfect-sleeve-swing-tunic-top/prd-356831#color/0000009320')
tree = html.fromstring(page.content)
information = tree.xpath("//script[contains(text(), '\"page\": { \"pageName\": \"Clothing :')]/text()")
print(category_regex.findall(str(information)))
Output: ['Tops']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.