繁体   English   中英

Python:用xpath提取后替换html img src

[英]Python: replace html img src after extraction with xpath

我从这个站点中提取了一些 html 代码,现在我可以看到我刮除图像的所有代码,因为它们的 src 不正确。

#!C:/Python27/python
from lxml import etree
import requests

q = "http://www.dlib.org/dlib/november14/giannakopoulos/11giannakopoulos.html"
page = requests.get(q)
tree = etree.HTML(page.text)
element = tree.xpath('./body/form/table[3]/tr/td/table[5]')
content = etree.tostring(element[0])
print "Content-type: text\n\n"
print content.strip()

现在我读取了正确的 img src(我想要的)并将其放入一个数组中:

pic=[]
link = q.rsplit("/",1)
images = tree.xpath("//img/@src")
for i in images:
    if i.find('.gif') == -1:
        pic.append(link[0]+"/"+i)

如何用数组中的 src 替换刮取的 src?

我很确定这就是你要找的。

link = q.rsplit("/",1)
images = tree.xpath("//img")

for idx, image in enumerate(images):
    if '.gif' not in image.attrib['src']:
        images[idx].attrib['src'] = link[0]+'/'+image.attrib['src']

for image in images:
    print image.attrib['src']

它遍历所选的每个图像,如果'.gif'不在图像src属性中,它会将src属性更新为您指定的 PNG/JPG 路径。

输出

../../../img2/space.gif
../../../img2/search2.gif
../../../img2/space.gif
../../../img2/D-Lib-blocks.gif
../../../img2/transparent.gif
../../../img2/magazine.gif
../../../img2/transparent.gif
../../../img2/transparent.gif
../../../img2/space.gif
../../../img2/space.gif
http://www.dlib.org/dlib/november14/giannakopoulos/giann-formula1.png
http://www.dlib.org/dlib/november14/giannakopoulos/giann-fig1-sm.png
http://www.dlib.org/dlib/november14/giannakopoulos/giann-fig2.png
http://www.dlib.org/dlib/november14/giannakopoulos/giann-fig3.png
http://www.dlib.org/dlib/november14/giannakopoulos/giann-fig4.png
http://www.dlib.org/dlib/november14/giannakopoulos/giannakopoulos.jpg
http://www.dlib.org/dlib/november14/giannakopoulos/foufoulas.jpg
http://www.dlib.org/dlib/november14/giannakopoulos/stamatogiannakis.png
http://www.dlib.org/dlib/november14/giannakopoulos/dimitropoulos.jpg
http://www.dlib.org/dlib/november14/giannakopoulos/manola.jpg
http://www.dlib.org/dlib/november14/giannakopoulos/ioannidis.png
../../../img2/transparent.gif

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM