简体   繁体   English

Python - 使用 beautifulsoup 解析输入 html 数据并将输出数据按列存储在 csv 文件中

[英]Python - Parsing input html data with beautifulsoup and store output data columnwise in csv file

I extract html data from a mail and parse this data with beautifulsoup.我从邮件中提取 html 数据并使用 beautifulsoup 解析这些数据。 Next, I want to store the parsed data under the right headers in the csv file.接下来,我想将解析后的数据存储在 csv 文件的正确标题下。 However, the text of the input data does not show accordingly in the output csv file.但是,输入数据的文本不会相应地显示在输出 csv 文件中。

Parsed input data ( fruits_html ) for csv file: csv 文件的解析输入数据 ( fruits_html ):

Apples                        43        0       0                   0<br/>
Bananas                     2282        0     500                   0<br/>
Grapes                      2534        0     500                   0<br/>
Oranges                      274        0       0                   0<br/>
--------------------------------------------------------------------------------------------------<br/>

Script:脚本:

# Parse raw messages to something readable
soup = BeautifulSoup(raw_email, 'html.parser')
fruits_html = soup.find_all('span')
headers = ["Names", "Quantity", "SpareQty", "MinQty", "MaxQty"]

with open('output.csv', 'w', newline='') as f_output:
    csv_output = csv.writer(f_output, delimiter=',')
    csv_output.writerow(headers)
    for br in soup.find_all('span'):
        csv_output.writerow([fruits_html for br in br.find_all('br')])

Desired output:期望的输出:

I want to store all the quantities under the right header in the csv file.我想将所有数量存储在 csv 文件的正确标题下。 Unfortunately, my current output shows the headers in the first row, and in the second row a large number of <br/> in different cells.不幸的是,我当前的输出在第一行显示标题,在第二行显示不同单元格中的大量<br/>

import csv
from bs4 import BeautifulSoup
from bs4.element import NavigableString
data = '''
<html>
<span>

Apples                        43        0       0                   0<br/>
Bananas                     2282        0     500                   0<br/>
Grapes                      2534        0     500                   0<br/>
Oranges                      274        0       0                   0<br/>

</span>
</html>'''

soup = BeautifulSoup(data, 'html.parser')
#print(soup.find_all("span"))
headers = ["Names", "Quantity", "SpareQty", "MinQty", "MaxQty"]

with open('output.csv', 'w', newline='') as f_output:
    csv_output = csv.writer(f_output, delimiter=',')
    csv_output.writerow(headers)
    for br in soup.find_all("span"):
        for item in br.contents:
            if type(item) is not NavigableString:
                continue
            csv_output.writerow(item.strip().split())

With output.csv带有输出.csv

Names,Quantity,SpareQty,MinQty,MaxQty
Apples,43,0,0,0
Bananas,2282,0,500,0
Grapes,2534,0,500,0
Oranges,274,0,0,0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM