简体   繁体   English

如何迭代soup.findAll('tag1','tag2','tag3')中的多个标签?

[英]How can I iterate over multiple tags in soup.findAll('tag1', 'tag2', 'tag3')?

I'm trying to write a python script where modifying certain tags in multiple html files will be automated;我正在尝试编写 python 脚本,其中将自动修改多个 html 文件中的某些标签; running single command from the terminal.从终端运行单个命令。

I constructed the code base.我构建了代码库。

In my code base something I've done like below.在我的代码库中,我做了如下所示的事情。 Is there even more convenient way to do so with less code?有没有更方便的方法可以用更少的代码做到这一点?

#modifying the 'src' of <img> tag in the soup obj
for img in soup.findAll('img'):
    img['src'] = '{% static ' + "'" + img['src'] + "'" + ' %}'

#modifying the 'href' of <link> tag in the soup obj
for link in soup.findAll('link'):
    link['href'] = '{% static ' + "'" + link['href'] + "'" + ' %}'

#modifying the 'src' of <script> tag in the soup obj
for script in soup.findAll('script'):
    script['src'] = '{% static ' + "'" + script['src'] + "'" + ' %}'

For instance, can I do it in single for loop instead of 3?例如,我可以在单个 for 循环中而不是 3 中执行吗? Not saying it has to be like the way I wrote below, any good practice suggestion is what I'm looking for.并不是说它必须像我在下面写的那样,任何好的实践建议都是我正在寻找的。

for img, link, script in soup.findAll('img', 'link', 'script'):
    rest of the code goes here....

Perhaps use a dictionary to retrieve appropriate attribute?也许使用字典来检索适当的属性? Also, use faster css selectors.此外,使用更快的 css 选择器。

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://stackoverflow.com/questions/66541098/how-can-i-iterate-over-multiple-tags-in-soup-findalltag1-tag2-tag3')
soup = bs(r.content, 'lxml')

lookup = {
    'img':'src',
    'link': 'href',
    'script':'src'
}

for i in soup.select('img, link, script'):
    var = lookup[i.name]
    if i.has_attr(var):
        i[var] = '{% static ' + "'" + i[var] + "'" + ' %}'
        print(i[var])

Yes you can.是的你可以。 You can pass a list of elements to findAll method您可以将元素列表传递给 findAll 方法

for element in soup.findAll(['img', 'link', 'script']): # use find_all for bs4
    
    if element.name == 'img':
        value = element['src']
    elif element.name == 'href':
        value = element['href']
    elif element.name == 'script':
        value = element['src']
    else:
        continue
        
    print(val)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM