[英]Extracting full URL from href tag in scrapy
I'm trying to use scrapy to scrape URLs from offers from this site我正在尝试使用scrapy从该站点的报价中抓取 URL
This is the code I tried:这是我试过的代码:
url = response.css('a[data-tracking="click_body"]::attr(href)').extract()
But my code returns something very different from a URL.但是我的代码返回的东西与 URL 非常不同。 Here is the HTML code of the div I'm interested in.
这是我感兴趣的 div 的 HTML 代码。
<div class="offer-item-details">
<header class="offer-item-header">
<h3>
<a href="https://www.otodom.pl/oferta/gdansk-pod-inwestycje-cicha-lokalizacja-ID46DXu.html#ab04badaa0" data-tracking="click_body" data-tracking-data="{"touch_point_button":"title"}" data-featured-name="promo_top_ads">
<strong class="visible-xs-block">42 m²</strong>
<span class="text-nowrap">
<span class="offer-item-title">Gdańsk/ Pod Inwestycje/ Cicha Lokalizacja</span>
</span>
</a>
</h3>
<p class="text-nowrap"><span class="hidden-xs">Mieszkanie na sprzedaż: </span>Gdańsk, Ujeścisko-Łostowice, Łostowice</p>
<div class="vas-list-no-offer">
<a class="button-observed observe-link favourites-button observed-text svg-heart add-to-favourites" data-statkey="ad.observed.list" rel="nofollow" data-id="60688916" href="#" title="Obserwuj">
<div class="observed-text-container" style="display: flex;">
<span class="icon observed-60688916"></span>
<i class="icon-heart-filled"></i>
<div class="observed-label">Dodaj do ulubionych</div>
</div>
</a>
</div>
</header>
<ul class="params
" data-tracking="click_body" data-tracking-data="{"touch_point_button":"body"}">
<li class="offer-item-rooms hidden-xs">2 pokoje</li>
<li class="offer-item-price">
346 000 zł </li>
<li class="hidden-xs offer-item-area">42 m²</li>
<li class="hidden-xs offer-item-price-per-m">8 238 zł/m²</li>
</ul>
</div>
Copied selector of that tag:该标签的复制选择器:
#offer-item-ad_id45Wog > div.offer-item-details > header > h3 > a #offer-item-ad_id45Wog > div.offer-item-details > header > h3 > a
Copied xPath已复制 xPath
//*[@id="offer-item-ad_id45Wog"]/div[1]/header/h3/a //*[@id="offer-item-ad_id45Wog"]/div[1]/header/h3/a
Copied full xPath复制完整 xPath
/html/body/div[3]/main/section[2]/div/div/div[1]/div/article[1]/div[1]/header/h3/a /html/body/div[3]/main/section[2]/div/div/div[1]/div/article[1]/div[1]/header/h3/a
Your code gives you a list of the URLs.您的代码为您提供了 URL 列表。 The extract() method in this case gets a list.
在这种情况下,extract() 方法获取一个列表。 To allow scrapy to extract the data you will have to do a for loop and yield statement.
要允许 scrapy 提取数据,您必须执行 for 循环和 yield 语句。
url = response.css('a[data-tracking="click_body"]::attr(href)').extract()
for a in url:
yield{'url', a}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.