简体   繁体   English

Python-Scrapy无法获取数据

[英]Python - Scrapy unable to fetch data

I am just starting out with Python/Scrapy. 我刚开始使用Python / Scrapy。

I have a written a spider that crawls a website and fetches information. 我写过一个蜘蛛,可以抓取网站并获取信息。 But i am stuck in 2 places. 但是我被困在2个地方。

  1. I am trying to retrieve the telephone numbers from a page and they are coded like this 我正在尝试从页面中检索电话号码,它们的编码如下

     <span class="mrgn_right5">(+001) 44 42676000,</span> <span class="mrgn_right5">(+011) 44 42144100</span> 

The code i have is: 我的代码是:

getdata = soup.find(attrs={"class":"mrgn_right5"})
if getdata:
   aditem['Phone']=getdata.get_text().strip()
   #print phone

But it is fetching only the first set of numbers and not the second one. 但是它只获取第一组数字,而不获取第二组数字。 How can i fix this? 我怎样才能解决这个问题?

  1. On the same page there is another set of information 在同一页面上还有另一组信息

I am using this code 我正在使用此代码

    getdata = soup.find(attrs={"itemprop":"pricerange"})
    if getdata:
        #print getdata
        aditem['Pricerange']=getdata.get_text().strip()
        #print pricerange

But it is not fetching any thing. 但这并没有取得任何东西。

Any help on fixing these two would be great. 解决这两个问题的任何帮助都将非常有用。

From a browse of the Beautiful Soup documentation , find will only return a single result. 通过浏览Beautiful Soup文档find将仅返回单个结果。 If multiple results are expected/required, then use find_all instead. 如果预期/需要多个结果,请改用find_all Since there are two results, a list will be returned, so the elements of the list need to be joined together (for example) to add them to Phone field of your AdItem . 由于有两个结果,将返回一个列表,因此需要将列表中的元素连接在一起(例如),以将它们添加到AdItem Phone字段中。

getdata = soup.find_all(attrs={"class":"mrgn_right5"})
if getdata:
   aditem['Phone'] = ''.join([x.get_text().strip() for x in getdata])

For the second issue, you need to access the attributes of the returned object. 对于第二个问题,您需要访问返回对象的属性。 Try the following: 请尝试以下操作:

getdata = soup.find(attrs={"itemprop":"pricerange"})
if getdata:
    aditem['Pricerange'] = getdata.attrs['content']

And for the address information, the following code works but is very hacky and could no doubt be improved by someone who understands Beautiful Soup better than me. 对于地址信息,以下代码可以工作,但是非常hacky,毫无疑问,比谁比我更了解Beautiful Soup的人可以对其进行改进。

getdata = soup.find(attrs={"itemprop":"address"})
address = getdata.span.get_text()
addressLocality = getdata.meta.attrs['content']
addressRegion = getdata.find(attrs={"itemprop":"addressRegion"}).attrs['content']
postalCode = getdata.find(attrs={"itemprop":"postalCode"}).attrs['content']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM