简体   繁体   English

Python抓取(Beautiful Soup)从此HTML获取数据

[英]Python scraping (Beautiful Soup) to obtain data from this HTML

 <ul>
  <li>
    <div class="c_logo_box">
     <a href="money-transfer-companies/ria-money-transfer/"><img src="http://www.compareremit.com/uploads/ria-logo11.png" style="height:57px;width:147px;" alt="RIA Money Transfer"></a>
     <span class="rs"> <span class="txt13">&#8377;</span> 61.24</span>
       </div>
  </li>
 ...

I wish to scrap the name from 'alt = Ria Money Transfer' and rate from span 61.24. 我希望废除“alt = Ria Money Transfer”中的名称 ,并从61.24开始评分

So far I have this Python code: 到目前为止,我有这个Python代码:

#!/usr/bin/python

import requests
import re
from bs4 import BeautifulSoup

r = requests.get('http://www.compareremit.com')
data = r.text

soup = BeautifulSoup(data)
for rate in soup.find_all('li', re.compile('money')):
print rate.text

It gives me nothing. 它什么都没给我。 Could someone tell me what am I missing? 有人能告诉我我错过了什么吗? Also, I'm having trouble visualizing which element I'm support to look for in the for loop search, could you clarify in general how to know what to specify as a condition in for loop in such cases? 另外,我在查看for循环搜索中我支持哪个元素时遇到问题,你能否澄清一下在这种情况下如何知道在for循环中指定什么条件?

There are multiple ways to reach the element. 有多种方法可以达到元素。 One option is to rely on the a tag, href of which contains the ria-money-transfer part, then get the following span element containing the rate: 一种选择是依赖a标签,其中href包含ria-money-transfer部分,然后获得包含该速率的以下span元素

import re

from bs4 import BeautifulSoup
import requests

response = requests.get('http://www.compareremit.com')
soup = BeautifulSoup(response.content)

link = soup.find('div', class_='c_logo_box').find('a', href=re.compile(r'ria-money-transfer'))
print(link.img.get('alt'))

rate = link.find_next_sibling('span').text.split(' ')[-1]
print(rate)

Prints: 打印:

RIA Money Transfer
61.24

Your code is logically not correct. 您的代码逻辑上不正确。 You can do this in multiple ways, try this code 您可以通过多种方式执行此操作,请尝试此代码

#!/usr/bin/python #!的/ usr / bin中/蟒蛇

import requests
import re
from bs4 import BeautifulSoup

r = requests.get('http://www.compareremit.com')
data = r.text

soup = BeautifulSoup(data)
for rate in soup.find_all('div',{"class":"c_logo_box"}):
    print rate.a.img['alt'] 
    print rate.span.text

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM