简体   繁体   English

Python HTML 源代码

[英]Python HTML source code

I would like to write a script that picks a special point from the source code and returns it.我想编写一个脚本,从源代码中选择一个特殊点并返回它。 (print it) (打印出来)

import urllib.request                           

Webseite = "http://myip.is/"                    
html_code = urllib.request.urlopen(Webseite)

print(html_code.read().decode('ISO-8859-1'))

This is my current code.这是我目前的代码。 I would like to print only the IP address that the website gives.我只想打印网站提供的 IP 地址。 The input of this I will print in python (title="copy ip address").这个输入我会用python打印出来(title="copy ip address")。

You could use jsonip which returns a JSON object that you can easily parse using standard Python library您可以使用jsonip返回一个 JSON 对象,您可以使用标准 Python 库轻松解析该对象

import json
from urllib2 import urlopen

my_ip = json.load(urlopen('http://jsonip.com'))['ip']
import requests
from bs4 import BeautifulSoup

s = requests.Session()
r = s.get('http://myip.is/')

soup = BeautifulSoup(r.text, "html5lib")
myIP = mySoup.find('a', {'title': 'copy ip address'}).text
print(myIP)

This uses the requests library (which you should always use for HTTP requests) to pull the page, feeds the content to BeautifulSoup, a very nice HTML parser, and asks BeautifulSoup to find a single <a> tag, with the atrtibuet title set to 'copy ip address', and then save the text component of that tag as myIP .这使用请求库(您应该始终用于 HTTP 请求)来拉页面,将内容提供给 BeautifulSoup,一个非常好的 HTML 解析器,并要求 BeautifulSoup 找到一个<a>标签,将属性title设置为'copy ip address',然后将该标签的文本部分保存为myIP

You can use a regular expression to find the IP addresses:您可以使用正则表达式来查找 IP 地址:

import urllib.request
import re

Webseite = "http://myip.is/"
html_code = urllib.request.urlopen(Webseite)

content = html_code.read().decode('ISO-8859-1')
ip_regex = r'(?:[0-9]{1,3}\.){3}[0-9]{1,3}'

ips_found = re.findall(ip_regex, content)
print(ips_found[0])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM