如何使用 Xpath 从脚本中提取电话号码

Question

(Scrapy)I need help with the next code: （Scrapy）我需要下一个代码的帮助：

def parse_item(self, response):

        ml_item = MercadoItem()
        #info de producto
        ml_item['nombre'] = response.xpath('//h1[@class="title"]/text()').extract()
        ml_item['web'] = response.xpath('/html/body/div[1]/div/div/div[1]/main/div/div[1]/div[2]/div[1]/div/div[4]/a/@href').extract()
        ml_item['datos'] = response.xpath('string(/html/head/script[3]/text()').extract() 
        ml_item['direccion'] = response.xpath('/html/body/div[1]/div/div/div[1]/main/div/div[1]/div[2]/div[1]/div/span[2]/text()').extract()


        self.item_count += 1
        if self.item_count > 5:
            raise CloseSpider('item_exceeded')
        yield ml_item

ml-item['datos'] is the script contains the phone number, i need extract only phone number, i try extract with regex and xpath but i cant do it. ml-item['datos'] 是包含电话号码的脚本，我只需要提取电话号码，我尝试使用正则表达式和 xpath 进行提取，但我做不到。 The script contains a lot of info, but i only need a phone number, i need extract it with a regex expresion because the phone number change in the next page.该脚本包含很多信息，但我只需要一个电话号码，我需要使用正则表达式提取它，因为电话号码会在下一页更改。 The script is:脚本是：

{"@context":"http://schema.org","@type":"LocalBusiness","name":"Clínica Dental Castellana 23","description":".TU CLÍNICA DENTAL DE REFERENCIA EN MADRID","telephone":"+34912298837","address":{"@type":"PostalAddress","streetAddress":"Castellana 23","addressLocality":"MADRID","addressRegion":"Madrid","postalCode":"28003"}}

Answer 1

Data in your script tag saved in JSON format. script标签中的数据以 JSON 格式保存。 It can be converted into python data scruture with python built-in json module python内置json模块可转换为python数据结构

import json
.....


def parse_item(self, response):
    ....
    script_data = response.xpath('string(/html/head/script[3]/text()').extract()
    decoded_data = json.loads(script_data)
    ml_item['datos'] = decoded_data["telephone"]

如何使用 Xpath 从脚本中提取电话号码

问题描述

1 个解决方案

解决方案1
0 2019-10-27 17:28:13

如何使用 Xpath 从脚本中提取电话号码

问题描述

1 个解决方案

解决方案1 0 2019-10-27 17:28:13

解决方案1
0 2019-10-27 17:28:13