简体   繁体   English

使用 Python 从特定行开始读取和解析 HTML 个文件

[英]Reading and parsing HTML files starting from a specific line using Python

I have this Python code I'm trying to improve in order to read and parse some HTML files, but I want it to start from the line ie 415. Because I want to target the <div class="panel-body"> where there's the data I want to parse.我有这个 Python 代码,我正在尝试改进以读取和解析一些 HTML 文件,但我希望它从第 415 行开始。因为我想定位<div class="panel-body">在哪里有我要解析的数据。 Because there's already another <div class="panel-body"> , but it's not the right one I want to target.因为已经有另一个<div class="panel-body"> ,但它不是我想要定位的正确对象。 Here's my code:这是我的代码:

for filename in os.listdir(folder):
    if filename.endswith('.html'):
        fname = os.path.join(folder, filename)
        print('Filename: {}'.format(fname))

        with open (fname, 'r', encoding='utf8') as f:
            soup = BeautifulSoup(f.read(), 'html.parser')
            info = soup.find_all('div' ,class_= 'panel-body')

You may extract the lines starting from 415 till end.您可以提取从 415 开始到结束的行。 Pass this block to BeautifulSoup to get data out of HTML. Here is the code.将此块传递给BeautifulSoup以从 HTML 中获取数据。这是代码。

from itertools import islice
from bs4 import BeautifulSoup
import os
fname =  "TestFile"
folder = "TestFolder"
for filename in os.listdir(folder):
    if filename.endswith('.html'):
       fname = os.path.join(folder, filename)
       print('Filename: {}'.format(fname))
with open (fname, 'r', encoding='utf8') as f:
    block = islice(f, 415, 600)
    for line in block:
        soup = BeautifulSoup(line, 'html.parser')
        info = soup.find_all('div', class_='panel-body')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM