[英]How extract phone Number from script with Xpath
(Scrapy)I need help with the next code: (Scrapy)我需要下一个代码的帮助:
def parse_item(self, response):
ml_item = MercadoItem()
#info de producto
ml_item['nombre'] = response.xpath('//h1[@class="title"]/text()').extract()
ml_item['web'] = response.xpath('/html/body/div[1]/div/div/div[1]/main/div/div[1]/div[2]/div[1]/div/div[4]/a/@href').extract()
ml_item['datos'] = response.xpath('string(/html/head/script[3]/text()').extract()
ml_item['direccion'] = response.xpath('/html/body/div[1]/div/div/div[1]/main/div/div[1]/div[2]/div[1]/div/span[2]/text()').extract()
self.item_count += 1
if self.item_count > 5:
raise CloseSpider('item_exceeded')
yield ml_item
ml-item['datos'] is the script contains the phone number, i need extract only phone number, i try extract with regex and xpath but i cant do it. ml-item['datos'] 是包含电话号码的脚本,我只需要提取电话号码,我尝试使用正则表达式和 xpath 进行提取,但我做不到。 The script contains a lot of info, but i only need a phone number, i need extract it with a regex expresion because the phone number change in the next page.
该脚本包含很多信息,但我只需要一个电话号码,我需要使用正则表达式提取它,因为电话号码会在下一页更改。 The script is:
脚本是:
{"@context":"http://schema.org","@type":"LocalBusiness","name":"Clínica Dental Castellana 23","description":".TU CLÍNICA DENTAL DE REFERENCIA EN MADRID","telephone":"+34912298837","address":{"@type":"PostalAddress","streetAddress":"Castellana 23","addressLocality":"MADRID","addressRegion":"Madrid","postalCode":"28003"}}
Data in your script
tag saved in JSON format. script
标签中的数据以 JSON 格式保存。 It can be converted into python data scruture with python built-in json
module python内置
json
模块可转换为python数据结构
import json
.....
def parse_item(self, response):
....
script_data = response.xpath('string(/html/head/script[3]/text()').extract()
decoded_data = json.loads(script_data)
ml_item['datos'] = decoded_data["telephone"]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.