简体   繁体   English

如何使用 Xpath 从脚本中提取电话号码

[英]How extract phone Number from script with Xpath

(Scrapy)I need help with the next code: (Scrapy)我需要下一个代码的帮助:

def parse_item(self, response):

        ml_item = MercadoItem()
        #info de producto
        ml_item['nombre'] = response.xpath('//h1[@class="title"]/text()').extract()
        ml_item['web'] = response.xpath('/html/body/div[1]/div/div/div[1]/main/div/div[1]/div[2]/div[1]/div/div[4]/a/@href').extract()
        ml_item['datos'] = response.xpath('string(/html/head/script[3]/text()').extract() 
        ml_item['direccion'] = response.xpath('/html/body/div[1]/div/div/div[1]/main/div/div[1]/div[2]/div[1]/div/span[2]/text()').extract()


        self.item_count += 1
        if self.item_count > 5:
            raise CloseSpider('item_exceeded')
        yield ml_item

ml-item['datos'] is the script contains the phone number, i need extract only phone number, i try extract with regex and xpath but i cant do it. ml-item['datos'] 是包含电话号码的脚本,我只需要提取电话号码,我尝试使用正则表达式和 xpath 进行提取,但我做不到。 The script contains a lot of info, but i only need a phone number, i need extract it with a regex expresion because the phone number change in the next page.该脚本包含很多信息,但我只需要一个电话号码,我需要使用正则表达式提取它,因为电话号码会在下一页更改。 The script is:脚本是:

{"@context":"http://schema.org","@type":"LocalBusiness","name":"Clínica Dental Castellana 23","description":".TU CLÍNICA DENTAL DE REFERENCIA EN MADRID","telephone":"+34912298837","address":{"@type":"PostalAddress","streetAddress":"Castellana 23","addressLocality":"MADRID","addressRegion":"Madrid","postalCode":"28003"}}

Data in your script tag saved in JSON format. script标签中的数据以 JSON 格式保存。 It can be converted into python data scruture with python built-in json module python内置json模块可转换为python数据结构

import json
.....


def parse_item(self, response):
    ....
    script_data = response.xpath('string(/html/head/script[3]/text()').extract()
    decoded_data = json.loads(script_data)
    ml_item['datos'] = decoded_data["telephone"]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM