简体   繁体   English

如何使用Beautiful Soup 4从网站上抓取隐藏的电话号码

[英]How to scrape hidden phone number from website using Beautiful Soup 4

I'm trying to scrape some data about house listings on the website http://immobilienscout.de . 我正尝试在http://immobilienscout.de网站上抓取一些关于房屋列表的数据。 So far, I managed to scrape all the needed data except one thing: the phone number of the listing agent. 到目前为止,我成功地刮除了所有必要的数据,只有一件事:发布代理的电话号码。

Problem is I can't understand the path to reach the text. 问题是我无法理解到达文字的路径。

Let's say for example that I want to find the price. 例如,假设我要查找价格。 My code for finding would be the following: 我的查找代码如下:

Html code: HTML代码:

<div class="is24-phone palm-hide" data-is24-phone-number-block="" data-ng-show="!showPhoneNumbers" data-position="top">
            <div class="is24-show-phone-button print-hide hide">
              <span class="fa fa-phone font-lightgray"></span>
              <a href="javascript:void(0);" class="internal-link"><font><font>Show phone number</font></font></a>
            </div>
            <div class="is24-phone-number">
              <p>
                  <span><font><font>Mobil:</font></font></span><font><font> 0162 2056442</font></font></p>
              <p>
                  <span><font><font>Phone:</font></font></span><font><font> 030 72021143</font></font></p>
              </div>
          </div>

My code looks like this: 我的代码如下所示:

link = "https://www.immobilienscout24.de/expose/96068611"   
html = urllib2.urlopen(link)   
soup = BeautifulSoup(html, "html.parser")

findMobile = soup.find('div', attrs={'class': 'is24-phone-number'})
print findMobile.text.strip()

The output in None. 无输出。 Instead I need the output to be: 0162 2056442. 相反,我需要输出为:0162 2056442。

Any help? 有什么帮助吗?

If you open the page eg in Chrome, you should be able to right click what you want to scrape and hit "Inspect Element". 如果您打开页面(例如在Chrome中),则应该能够右键单击要刮取的内容,然后点击“检查元素”。 Then, in the view of the DOM that pops up again, right click the element and select Copy > Copy selector. 然后,在再次弹出的DOM视图中,右键单击该元素,然后选择“复制”>“复制选择器”。 That should give you a css selector that looks something like 那应该给你一个css选择器,看起来像

#sidebar > div.module.community-bulletin > div > div:nth-child(10) > div.bulletin-item-content > a

Then, you should be able to select that element by just doing 然后,您应该只需执行以下操作即可选择该元素

soup.select("#sidebar > div.module.community-bulletin > div > div:nth-child(10) > div.bulletin-item-content > a")

Edit: Here's the documentation for .select() : https://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors 编辑:这是.select()的文档: https : .select()

Here's an example: 这是一个例子:

>>> from bs4 import BeautifulSoup
>>> import requests
>>> r = requests.get("https://stackoverflow.com/questions/45224417/how-to-scrape-hidden-phone-number-from-website-using-beautiful-soup-4/45224481#45224481")
>>> soup = BeautifulSoup(r.text, 'html.parser')
>>> soup.select("#comment-77415832 > td.comment-text > div > span.comment-copy")
[<span class="comment-copy">I tried to use your code for the element I am interested but the output is an empty list. Any ideas how to solve this?</span>]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM