簡體   English   中英

湯.find_all 不返回任何數據

[英]soup.find_all doesnt return anydata

我試圖獲得房屋的位置,但我沒有得到任何數據只是“[]”。 新的 Python 和新的網絡抓取。 這是我的代碼:

from bs4 import BeautifulSoup
import requests
import pandas as pd

url = 'https://www.inmuebles24.com/casas-en-venta-en-tijuana.html'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')

location = soup.find_all(class_='posting-location go-to-posting')
print(location)

經過仔細檢查,您的代碼應該可以按預期工作。 下面包括使用find_all提取多個 css 類的其他替代方法


location = soup.find_all('span',class_=['posting-location', 'go-to-posting'])

# or

location = soup.find_all(class_='posting-location go-to-posting')

# or

location = soup2.find_all('span',{'class':'posting-location go-to-posting'})

以上是手動復制頁面的源代碼/html后測試的。 我收到了 20 件商品

問題

您真正的問題在於您試圖抓取的網站。 該網站已采取措施來減少可能試圖通過使用驗證碼塊抓取其內容的機器人和人員。

如果您按如下方式查看請求的響應,您可能會看到這一點

print(page.text)

我復制了一個片段供您閱讀:

![CDATA[\n    var a = function() {try{return !!window.addEventListener} catch(e) {return !1} },\n      b = function(b, c) {a() ? document.addEventListener("DOMContentLoaded", b, c) : document.attachEvent("onreadystatechange", b)};\n      b(function(){\n        var cookiesEnabled=(navigator.cookieEnabled)? true : false;\n        if(!cookiesEnabled){\n          var q = document.getElementById(\'no-cookie-warning\');q.style.display = \'block\';\n        }\n      });\n  //]]>\n  </script>\n  <div id="trk_captcha_js" style="background-image:url(\'/cdn-cgi/images/trace/captcha/nojs/h/transparent.gif?ray=5d997e89698b1414\')"></div>\n</form>\n\n              </div>\n            </div>\n\n            <div class="cf-column">\n              <div class="cf-screenshot-container">\n              \n                <span class="cf-no-screenshot"></span>\n              \n              </div>\n            </div>\n          </div><!-- /.columns -->\n        </div>\n      </div><!-- /.captcha-container -->\n\n      <div class="cf-section cf-wrapper">\n        <div class="cf-columns two">\n          <div class="cf-column">\n            <h2 data-translate="why_captcha_headline">Why do I have to complete a CAPTCHA?</h2>\n            \n            <p data-translate="why_captcha_detail">Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.</p>\n          </div>\n\n          <div class="cf-column">\n            <h2 data-translate="resolve_captcha_headline">What can I do to prevent this in the future?</h2>\n            \n\n            <p data-translate="resolve_captcha_antivirus">If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.</p>\n\n            <p data-translate="resolve_captcha_network">If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices.</p>\n            \n              \n            \n          </div>\n        </div>\n      </div><!-- /.section -->\n      \n\n      <div class="cf-error-footer cf-wrapper w-240 lg:w-full py-10 sm:py-4 sm:px-8 mx-auto text-center sm:text-left border-solid border-0 border-t border-gray-300">\n  <p c

建議

您可以考慮尋找 API 或使用網站所有者允許和批准的方法。

嘗試這個:

span_tags = soup.find_all('span')
for span in span_tags:
  if span['class'] == 'posting-location go-to-posting':
    print(span.text)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM