简体   繁体   English

有什么方法可以翻译网页语言,或在使用scrapy进行抓取时翻译抓取的数据?

[英]is there any way to translate web page language, or translating scraped data while scraping using scrapy?

i am going to scrape dintex.net website in english laguage, but can't find any way to convert scraped data in English language. 我要用英语抓取dintex.net网站,但找不到任何方法来转换英语的抓取数据。 I also used googletans but it also shows error, so is there any other way to convert that page or data to English? 我也使用了googletans,但它也会显示错误,因此还有其他方法可以将该页面或数据转换为英语吗?

import scrapy
from googletrans import Translator
class DtSpider(scrapy.Spider):
name = 'dt'
start_urls = ['http://www.dintex.net']

def parse(self, response):
    urls = response.xpath('//*[@class="listing-btn btn btn-primary btn-block w-100"]/@href').extract()
    for url in urls:
        url = response.urljoin(url)
        yield scrapy.Request(url=url, callback=self.parse_details)

    np = response.xpath('//*[@class="page-item"]/a[@rel="next"]/@href').extract_first()
    ap = response.urljoin(np)
    yield scrapy.Request(url=ap,callback=self.parse)

def parse_details(self,response):
    Title = response.xpath('//*[@class="show-info__title"]/text()').extract_first()
    Location = response.xpath('//*[@class="show-info__location"]/p/text()').extract_first()
    Contact = response.xpath('//*[@class="show-info__contact-details__phone-link"]/text()').extract_first()
    Contact = Contact.replace('Whatsapp ','')
    Description = response.xpath('//*[@class="show-info__section-text"]/p/text()').extract_first()
    Manufacture = response.xpath('//td[contains(text(),"Fabricante")]/following-sibling::td/text()').extract_first()
    Model = response.xpath('//td[contains(text(),"Modelo")]/following-sibling::td/text()').extract_first()
    Year = response.xpath('//td[contains(text(),"Año")]/following-sibling::td/text()').extract_first()
    Condition = response.xpath('//td[contains(text(),"Condición")]/following-sibling::td/text()').extract_first()
    img = response.xpath('//*[@class="gallery__item"]/img/@src').extract_first()
    thumbs =  response.xpath('//img/@lazy-src').extract()

    #t = Translator()
    #Title = t.translate(Title).text
    #Location = t.translate(Location).text
    #Contact = t.translate(Contact).text
    #Description = t.translate(Description).text
    #Manufacture = t.translate(Manufacture).text
    #Model = t.translate(Model).text
    #Year = t.translate(Year).text
    #Condition = t.translate(Condition).text

    yield{'Title': Title,
    'Location' : Location,
    'Contact' : Contact,
    'Description' : Description,
    'Manufacture' : Manufacture,
    'Model' : Model,
    'Year' : Year,
    'Condition' : Condition,
    'Img' : img,
    'Thums' : thumbs
    }

I think you should send this cookie with your requests 我认为您应该随请求发送此Cookie

googtrans=/es/en

As the page allows for localisation depending on selection of the available langauage/region. 由于页面允许本地化,具体取决于可用语言/区域的选择。

You would need to do something like this see cookie part from the scrapy request from scrapy docs 您需要执行以下操作,从scrapy docsscrapy请求中查看cookie部分

The request you are yielding might need to change something like this(not tested) 您产生的请求可能需要更改这样的内容(未测试)

scrapy.Request(url=url, cookies= {'googletrans': '/es/en'}, callback=self.parse_details)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM