简体   繁体   English

在Raspberry Pi中读取URL

[英]Read a URL in Raspberry Pi

I want to read data present in a URL. 我想读取URL中存在的数据。 For example if I had this URL: 例如,如果我有以下URL:

http://robolab.in/home-automation.html#ON http://robolab.in/home-automation.html#ON

I want to read the status 'ON', leaving behind the rest of the URL. 我想读取状态“ ON”,而保留其余URL。 How can this be done? 如何才能做到这一点?

what you are trying to do is called web scraping. 您尝试做的事情称为网页抓取。 In python using urllib/urllib2 library you can achieve this goal. 在使用urllib / urllib2库的python中,您可以实现此目标。

import urllib

try:
    html=urllib.urlopen('http://robolab.in/home-automation.html#ON')
    htmltext=html.read()
except:
    print 'error opening link'

print htmltext

this prints the html text that your browser shows you. 这会打印您的浏览器显示的html文本。 now this is just a string... you can manipulate it anyway you want. 现在这只是一个字符串...您可以随时对其进行操作。 But if you have BeautifulSoup installed you can code something like this: 但是,如果您安装了BeautifulSoup,则可以编写如下代码:

from bs4 import BeautifulSoup

soup=BeautifulSoup(htmltext)
for script in soup(["script", "style"]):
    script.extract()
text = soup.get_text()
print text

using this code and given your url I got this: 使用此代码并给出您的网址,我得到了:

Robolab Technologies
Home Automation

OFF

and you can easily proceed 您可以轻松进行

status=''
text=text.strip()
for index,line in enumerate(text):
    if index>3:
        status = line
if 'ON' in status:
    print "it's on"
else:
    print "it's off"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM