简体   繁体   English

如何从 html 文件中提取地址

[英]How to extract address from html file

I am new to the community.我是社区的新手。 I am working on a project for determining the address from an html file.我正在开发一个从 html 文件中确定地址的项目。 The specific string that I am trying to process is我要处理的特定字符串是

<address class="list-card-addr">1867 Central Ave, Augusta, GA 30904</address>

I have tried processing it using manual tools.我尝试使用手动工具对其进行处理。 I'd like to use python to process the entire html file.我想使用 python 来处理整个 html 文件。 Can someone explain how to do this in python?有人可以解释如何在 python 中执行此操作吗? Thank you in advance.先感谢您。

Use Regex to find the addresses....使用正则表达式查找地址....

r1 = re.findall(r"<address class=\"?list-card-addr\"?>([^<]+)", html)
print(r1)

You can extract the address using BeautifulSoup , which is very handy for accessing elements in HTML and XML documents.您可以使用BeautifulSoup提取地址,这对于访问 HTML 和 XML 文档中的元素非常方便。

from bs4 import BeautifulSoup
import requests

r = requests.get(url)
html = r.text
soup = BeautifulSoup(html, "html.parser")
addr = soup.find("address", class_="list-card-addr")
print(addr.text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM