簡體   English   中英

Python:用xpath提取后替換html img src

[英]Python: replace html img src after extraction with xpath

我從這個站點中提取了一些 html 代碼,現在我可以看到我刮除圖像的所有代碼,因為它們的 src 不正確。

#!C:/Python27/python
from lxml import etree
import requests

q = "http://www.dlib.org/dlib/november14/giannakopoulos/11giannakopoulos.html"
page = requests.get(q)
tree = etree.HTML(page.text)
element = tree.xpath('./body/form/table[3]/tr/td/table[5]')
content = etree.tostring(element[0])
print "Content-type: text\n\n"
print content.strip()

現在我讀取了正確的 img src(我想要的)並將其放入一個數組中:

pic=[]
link = q.rsplit("/",1)
images = tree.xpath("//img/@src")
for i in images:
    if i.find('.gif') == -1:
        pic.append(link[0]+"/"+i)

如何用數組中的 src 替換刮取的 src?

我很確定這就是你要找的。

link = q.rsplit("/",1)
images = tree.xpath("//img")

for idx, image in enumerate(images):
    if '.gif' not in image.attrib['src']:
        images[idx].attrib['src'] = link[0]+'/'+image.attrib['src']

for image in images:
    print image.attrib['src']

它遍歷所選的每個圖像,如果'.gif'不在圖像src屬性中,它會將src屬性更新為您指定的 PNG/JPG 路徑。

輸出

../../../img2/space.gif
../../../img2/search2.gif
../../../img2/space.gif
../../../img2/D-Lib-blocks.gif
../../../img2/transparent.gif
../../../img2/magazine.gif
../../../img2/transparent.gif
../../../img2/transparent.gif
../../../img2/space.gif
../../../img2/space.gif
http://www.dlib.org/dlib/november14/giannakopoulos/giann-formula1.png
http://www.dlib.org/dlib/november14/giannakopoulos/giann-fig1-sm.png
http://www.dlib.org/dlib/november14/giannakopoulos/giann-fig2.png
http://www.dlib.org/dlib/november14/giannakopoulos/giann-fig3.png
http://www.dlib.org/dlib/november14/giannakopoulos/giann-fig4.png
http://www.dlib.org/dlib/november14/giannakopoulos/giannakopoulos.jpg
http://www.dlib.org/dlib/november14/giannakopoulos/foufoulas.jpg
http://www.dlib.org/dlib/november14/giannakopoulos/stamatogiannakis.png
http://www.dlib.org/dlib/november14/giannakopoulos/dimitropoulos.jpg
http://www.dlib.org/dlib/november14/giannakopoulos/manola.jpg
http://www.dlib.org/dlib/november14/giannakopoulos/ioannidis.png
../../../img2/transparent.gif

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM