简体   繁体   中英

How to get text inside script tag in HTML

I am trying to scrape this site:

https://www.lanebryant.com/perfect-sleeve-swing-tunic-top/prd-356831#color/0000009320

I want to get type of clothing, ie the category of the clothing. There is a script on the page: 在此处输入图片说明

How can I collect this text and get the category of the clothing which I have highlighted in the image? I have tried the following code but it returns nothing.

type = d.find_element_by_xpath("//script[@type='text/javascript']").text
print("hiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii"+type)

d here is the driver

Here you go...

1.Get the innerHTML of the scripts tag

2.Convert into Json() format

3.use the parameter and then get the value tops

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import json

driver = webdriver.Chrome()
driver.get('https://www.lanebryant.com/perfect-sleeve-swing-tunic-top/prd-356831')
item = WebDriverWait(driver, 10).until(EC.presence_of_element_located(
    (By.XPATH, "//script[@type='text/javascript'][contains(.,'window.lanebryantDLLite')]"))).get_attribute('innerHTML')
itemtext = item.split("=")[1].split(";")[0]  # This will return as string

itemjson = json.loads(itemtext.strip())  # Converted here into json format

itemtop = itemjson['page']['pageName']  # Use the parameter to get the text

print(itemtop.split(':')[1].strip())  # Split here to get only value tops

Hope this helps.

try something like this,

type = d.find_element_by_xpath('//script[@type="text/javascript"]').text

Also make a count of script tags in the page source.

One of the problems with your current way is that you collect all scripts on the current page, you need to narrow it a bit.

This finds the correct script and then collects the category with the help of regex:

from lxml import html
import requests
import re
# create the regex
category_regex = re.compile(r'(?<="category": ").*(?=", "CategoryID")')
page = requests.get('https://www.lanebryant.com/perfect-sleeve-swing-tunic-top/prd-356831#color/0000009320')
tree = html.fromstring(page.content)
information = tree.xpath("//script[contains(text(), '\"page\": {    \"pageName\": \"Clothing :')]/text()")
print(category_regex.findall(str(information)))

Output: ['Tops']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM