简体   繁体   English

谷歌文档importXML TripAdvisor

[英]Google docs importXML TripAdvisor

ImportXML : I am trying to scrape http://www.tripadvisor.in/Hotels-g297615-c2-Gurgaon_Haryana-Hotels.html . ImportXML:我正在尝试抓取http://www.tripadvisor.in/Hotels-g297615-c2-Gurgaon_Haryana-Hotels.html I need the following fields - Name of the hotel, URL, Address, City, Pincode, no of reviews(just number) for each hotel, percentage and amenities. 我需要以下字段-酒店名称,URL,地址,城市,密码,每个酒店的评论数(仅数),百分比和便利设施。 I want this all in a single row for a single hotel. 我希望所有这一切都在单个酒店的一行中。 Can anyone help me out. 谁能帮我吗。

Google Doc Link: https://docs.google.com/spreadsheets/d/1D6X9c9uX7AltxWQ3ln0Pqqzq_CIroCkDxPYr6lv-47k/edit#gid=1666841843 Google文档链接: https : //docs.google.com/spreadsheets/d/1D6X9c9uX7AltxWQ3ln0Pqqzq_CIroCkDxPYr6lv-47k/edit#gid=1666841843

I am unable to get all the above requirements in GoogleDoc. 我无法在GoogleDoc中获得所有上述要求。 I am stuck in the address scraping just beside the URL. 我被困在URL旁边的地址抓取中。

Responding to your comment, you can limit data returned by the xpath to one using index ( [1] ), for example (formatted for readability): 响应您的评论,您可以使用索引( [1] )将xpath返回的数据限制为一个(例如,出于可读性考虑而格式化):

=IMPORTXML(concatenate("http://www.tripadvisor.in",D2),
            "(//span[@class='street-address'])[1]")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM