简体   繁体   English

Python beautifulsoup按行号打印

[英]Python beautifulsoup print by line #

Okay so I'm currently using python beautifulsoup to output a specific line from a html file, since the html contains multiple of the same div class, it'll output every div containing the same class, example of this 好的,所以我目前正在使用python beautifulsoup从html文件输出特定行,因为html包含同一个div类的多个,它将​​输出每个包含同一个类的div,例如

CONTENT: 内容:

<div class=border>aaaa</a>
<div class=border>example</a>
<div class=border>runrunrun</a>

OUTPUT: 输出:

<div class=border>aaaa</a>
<div class=border>example</a>
<div class=border>runrunrun</a>

Now I only want #2 of div class border, 现在我只想要div类边框的#2,

<div class=border>example</a>

now if i view source within chrome, it'll show content in number lines, so line 1 will contain 现在,如果我在chrome中查看源代码,它将在数字行中显示内容,因此第1行将包含

<div class=border>aaaa</a> 

& line 2 will contain &第2行将包含

<div class=border>example</a>

is it possible to output via numbered line using beautiful soup? 可以用美丽的汤通过编号线输出吗?

find_all returns a list, so you can index it with [1] to get the second element. find_all返回一个列表,因此您可以使用[1]进行索引以获得第二个元素。

from bs4 import BeautifulSoup

html_doc = """<div class=border>aaaa</a>
<div class=border>example</a>
<div class=border>runrunrun</a>"""

soup = BeautifulSoup(html_doc, 'html.parser')

soup.find_all(class_="border")[1]

returns 退货

<div class="border">example</div>

If you have the list with say 200 elements generated by soup.find_all... If the list is called div_list, you could just do an index loop (you want index 1,4,7 etc...) 如果您有一个包含200个元素的列表,则由soup.find_all ...如果该列表名为div_list,则可以执行索引循环(您需要索引1,4,7等)。

count = 1
while True:
    try:
        print(div_list[count])
        count+=3
    except:
    # happens because of index error
        break

Or even shorter: 甚至更短:

count = 1
while count<= len(div_list):
    print(div_list[count])
    count+=3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM