[英]Read various files and write it to one Excel sheet using Pyhton, Pandas
不管我做什么,我都没有把 xhtml 文件中的所有数据都写在一张 Excel 工作表中。 看起来,Python 遍历文件夹中的所有文件,但作为输出,我只从最后一个文件中获取数据。 帮助会很棒!
#!/usr/bin/python3
# Import libaries
import pandas as pd
import openpyxl
from openpyxl import load_workbook
import glob
import time
#Path to folder
path_dir: str = r"C:\Users\Moench\Desktop\r2d2\EPUB\content1\*.xhtml"
#Read files
for filename in glob.glob(path_dir):
#Assign the table data to a Pandas dataframe
dfs = open(filename, 'r')
dfs1 = pd.read_html(dfs)
#Read data
df2 = dfs1[0][['Unnamed: 0_level_0','Unnamed: 1_level_0','Unnamed: 2_level_0','Unnamed: 3_level_0','Unnamed: 4_level_0','Unnamed: 12_level_0','Unnamed: 13_level_0']]
#Print result (Looks like that it goes through all files in the folder)
# print (df2)
# Write to existing Excel-Sheet
book = load_workbook('output.xlsx')
writer = pd.ExcelWriter('output.xlsx', engine='openpyxl')
writer.book = book
ts = time.time()
df3 = df2.append(df2)
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df3.to_excel(writer, str(ts))
writer.save()
您在每次迭代时将数据存储在同一个数据帧中,在每次迭代时重写它,因此您只有最后一个数据(实际上是两次,因为df2.append(df2)
。
这是一个稍微修改的版本,将每个数据帧存储在df_list
,并在此列表上使用pd.concat
创建df3
:
#!/usr/bin/python3
# Import libaries
import pandas as pd
import openpyxl
from openpyxl import load_workbook
import glob
import time
#Path to folder
path_dir: str = r"C:\Users\Moench\Desktop\r2d2\EPUB\content1\*.xhtml"
# Initiate list of dataframes
df_list = list()
#Read files
for filename in glob.glob(path_dir):
#Assign the table data to a Pandas dataframe
dfs = open(filename, 'r')
dfs1 = pd.read_html(dfs)
#Read data
df2 = dfs1[0][['Unnamed: 0_level_0','Unnamed: 1_level_0','Unnamed: 2_level_0','Unnamed: 3_level_0','Unnamed: 4_level_0','Unnamed: 12_level_0','Unnamed: 13_level_0']]
df_list.append(df2)
#Print result (Looks like that it goes through all files in the folder)
# print (df2)
# Write to existing Excel-Sheet
book = load_workbook('output.xlsx')
writer = pd.ExcelWriter('output.xlsx', engine='openpyxl')
writer.book = book
ts = time.time()
# Concatenate all dataframes into one
df3 = pd.concat(df_list, ignore_index=True)
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
df3.to_excel(writer, str(ts))
writer.save()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.