简体   繁体   中英

Get data from script tag with Scrapy Xpath and using it as CSV

I've been trying to extract data from script tag using Scrapy(xpath). My main issue is with identifying the correct div and script tags. I'm new to using xpath and would be thankful for any kind of help!

<script>    
var COUNTRY_SHOP_STATUS = "buy";
var COUNTRY_SHOP_URL = "";
try {
digitalData.page.pathIndicator.depth_2 = "mobile";
digitalData.page.pathIndicator.depth_3 = "mobile";
digitalData.page.pathIndicator.depth_4 = "smartphones";
digitalData.page.pathIndicator.depth_5 = "galaxy-s8";    
digitalData.product.pvi_type_name = "Mobile";
digitalData.product.pvi_subtype_name = "Smartphone";
digitalData.product.model_name = "SM-G950F";
digitalData.product.category = digitalData.page.pathIndicator.depth_3;
} catch(e) {}
</script>

I would finally like to populate my csv file with the data of model.name and depth 3, 4 and 5. I've tried the other solutions from the questions similar to this one but they seem to not work...

You can use regex to extract required values:

import re

source = response.xpath("//script[contains(., 'COUNTRY_SHOP_STATUS')]/text()").extract()[0]

def get_values(parameter, script):
    return re.findall('%s = "(.*)"' % parameter, script)[0]

print(get_values("pathIndicator.depth_5", source))
print(get_values("pvi_subtype_name", source))
print(get_values("model_name", source))
...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM