使用Scrapy Xpath从脚本标签获取数据并将其用作CSV

Question

I've been trying to extract data from script tag using Scrapy(xpath). 我一直在尝试使用Scrapy（xpath）从脚本标签中提取数据。 My main issue is with identifying the correct div and script tags. 我的主要问题是确定正确的div和脚本标签。 I'm new to using xpath and would be thankful for any kind of help! 我是使用xpath的新手，感谢提供任何帮助！

<script>    
var COUNTRY_SHOP_STATUS = "buy";
var COUNTRY_SHOP_URL = "";
try {
digitalData.page.pathIndicator.depth_2 = "mobile";
digitalData.page.pathIndicator.depth_3 = "mobile";
digitalData.page.pathIndicator.depth_4 = "smartphones";
digitalData.page.pathIndicator.depth_5 = "galaxy-s8";    
digitalData.product.pvi_type_name = "Mobile";
digitalData.product.pvi_subtype_name = "Smartphone";
digitalData.product.model_name = "SM-G950F";
digitalData.product.category = digitalData.page.pathIndicator.depth_3;
} catch(e) {}
</script>

I would finally like to populate my csv file with the data of model.name and depth 3, 4 and 5. I've tried the other solutions from the questions similar to this one but they seem to not work... 最后，我想用model.name以及深度3、4和5的数据填充我的csv文件。我尝试了与此问题类似的其他解决方案，但它们似乎不起作用...

Answer 1

You can use regex to extract required values: 您可以使用regex提取所需的值：

import re

source = response.xpath("//script[contains(., 'COUNTRY_SHOP_STATUS')]/text()").extract()[0]

def get_values(parameter, script):
    return re.findall('%s = "(.*)"' % parameter, script)[0]

print(get_values("pathIndicator.depth_5", source))
print(get_values("pvi_subtype_name", source))
print(get_values("model_name", source))
...

使用Scrapy Xpath从脚本标签获取数据并将其用作CSV

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-08-25 20:08:56

使用Scrapy Xpath从脚本标签获取数据并将其用作CSV

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-08-25 20:08:56

解决方案1
2 已采纳 2018-08-25 20:08:56