簡體   English   中英

從 html 標簽中提取文本

[英]Extract text from html tag

<li itemprop="foundingLocation" itemscope="" itemtype="https://schema.org/Place"><i class="icon-location"></i><span itemprop="address" itemscope="" itemtype="https://schema.org/PostalAddress">India, Delhi, Delhi</span></li>
<li><i class="icon-phone text-success"></i><a class="link visit-website-tracking" data-container="body" data-content="+91 9643861253" data-info="phone:triazine-software-pvt-ltd" data-placement="top" data-toggle="popover" rel="nofollow" role="button">Show phone number</a></li>
<li><i class="icon-office"></i>Founded: <span itemprop="foundingDate">2015</span></li>
<li itemprop="numberOfEmployees" itemscope="" itemtype="https://schema.org/QuantitativeValue"><i class="icon-users"></i><span itemprop="value">50-100</span> employees</li>
<li><i class="icon-budget"></i>Avg. budget: $10k-$15k (USD)</li>
<li><i class="icon-hourly"></i>Hourly fee: $25/h (USD)</li>

我需要提取

India, Delhi, Delhi
+91 9643861253
2015
50-100
Avg. budget: $10k-$15k (USD)
Hourly fee: $25/h (USD)

我怎樣才能進一步完成這項任務?

我的代碼

from urllib.request import urlopen
from bs4 import BeautifulSoup

url = "https://www.appfutura.com/developers/triazine-software-pvt-ltd"
html = urlopen(url).read()
soup = BeautifulSoup(html,"lxml")
class_list = ["developer-description"] # can add any other classes to this list.
Title = soup.find('h1',{"class":"big-title no-mar-top no-mar-bot strong"})
Info = soup.find('ul',{"class":"list-inline no-mar"})
for i in range(len(Info)):
    print(Info.contents[i])
    soup = BeautifulSoup(Info.contents[i],"lxml")
    Title = soup.find('i',{"class":"icon-budget"})
    print(Title.contents)

嘗試這個:

from urllib.request import urlopen

from bs4 import BeautifulSoup

html = urlopen("https://www.appfutura.com/developers/triazine-software-pvt-ltd").read()
soup = BeautifulSoup(html, "lxml").select_one('.profile .list-inline').find_all("li")

info = [i.getText() for i in soup if not i.getText().startswith("Show")]
phone = "".join(i.find("a")["data-content"] for i in soup if i.find("a"))

info.insert(1, phone)
print("\n".join(info))

Output:

India, Delhi, Delhi
+91 9643861253
Founded: 2015
50-100 employees
Avg. budget: $10k-$15k (USD)
Hourly fee: $25/h (USD)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM