简体   繁体   English

使用Scrapy Xpath从脚本标签获取数据并将其用作CSV

[英]Get data from script tag with Scrapy Xpath and using it as CSV

I've been trying to extract data from script tag using Scrapy(xpath). 我一直在尝试使用Scrapy(xpath)从脚本标签中提取数据。 My main issue is with identifying the correct div and script tags. 我的主要问题是确定正确的div和脚本标签。 I'm new to using xpath and would be thankful for any kind of help! 我是使用xpath的新手,感谢提供任何帮助!

<script>    
var COUNTRY_SHOP_STATUS = "buy";
var COUNTRY_SHOP_URL = "";
try {
digitalData.page.pathIndicator.depth_2 = "mobile";
digitalData.page.pathIndicator.depth_3 = "mobile";
digitalData.page.pathIndicator.depth_4 = "smartphones";
digitalData.page.pathIndicator.depth_5 = "galaxy-s8";    
digitalData.product.pvi_type_name = "Mobile";
digitalData.product.pvi_subtype_name = "Smartphone";
digitalData.product.model_name = "SM-G950F";
digitalData.product.category = digitalData.page.pathIndicator.depth_3;
} catch(e) {}
</script>

I would finally like to populate my csv file with the data of model.name and depth 3, 4 and 5. I've tried the other solutions from the questions similar to this one but they seem to not work... 最后,我想用model.name以及深度3、4和5的数据填充我的csv文件。我尝试了与此问题类似的其他解决方案,但它们似乎不起作用...

You can use regex to extract required values: 您可以使用regex提取所需的值:

import re

source = response.xpath("//script[contains(., 'COUNTRY_SHOP_STATUS')]/text()").extract()[0]

def get_values(parameter, script):
    return re.findall('%s = "(.*)"' % parameter, script)[0]

print(get_values("pathIndicator.depth_5", source))
print(get_values("pvi_subtype_name", source))
print(get_values("model_name", source))
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM