在Raspberry Pi中读取URL

Question

I want to read data present in a URL. 我想读取URL中存在的数据。 For example if I had this URL: 例如，如果我有以下URL：

http://robolab.in/home-automation.html#ON http://robolab.in/home-automation.html#ON

I want to read the status 'ON', leaving behind the rest of the URL. 我想读取状态“ ON”，而保留其余URL。 How can this be done? 如何才能做到这一点？

Answer 1

what you are trying to do is called web scraping. 您尝试做的事情称为网页抓取。 In python using urllib/urllib2 library you can achieve this goal. 在使用urllib / urllib2库的python中，您可以实现此目标。

import urllib

try:
    html=urllib.urlopen('http://robolab.in/home-automation.html#ON')
    htmltext=html.read()
except:
    print 'error opening link'

print htmltext

this prints the html text that your browser shows you. 这会打印您的浏览器显示的html文本。 now this is just a string... you can manipulate it anyway you want. 现在这只是一个字符串...您可以随时对其进行操作。 But if you have BeautifulSoup installed you can code something like this: 但是，如果您安装了BeautifulSoup，则可以编写如下代码：

from bs4 import BeautifulSoup

soup=BeautifulSoup(htmltext)
for script in soup(["script", "style"]):
    script.extract()
text = soup.get_text()
print text

using this code and given your url I got this: 使用此代码并给出您的网址，我得到了：

Robolab Technologies
Home Automation

OFF

and you can easily proceed 您可以轻松进行

status=''
text=text.strip()
for index,line in enumerate(text):
    if index>3:
        status = line
if 'ON' in status:
    print "it's on"
else:
    print "it's off"

在Raspberry Pi中读取URL

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-12-07 08:13:18

在Raspberry Pi中读取URL

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-12-07 08:13:18

解决方案1
2 已采纳 2015-12-07 08:13:18