[英]create single dataframe by reading multiple html files
我有 23 个 html 文件,其中包含相同表格格式的数据,我想创建所有文件的 dataframe 并合并到一个大 dataframe 中以供进一步分析,代码如下
import glob
import pandas as pd
all_rec = glob.glob('D:\python\*.html')
#print(all_rec)
list_data = []
for filename in all_rec:
data = pd.read_html(filename)
list_data.append(data)
list_data # sample output from t files
[[ start_time user_host query_time \
0 2020-02-19 07:01:56.411155 lrdba[lrdba] @ localhost [] 00:01:55.299187
1 2020-02-20 07:01:56.005284 lrdba[lrdba] @ localhost [] 00:01:54.210222
db sql_text
0 kvb call PROC_PROCESSINGSUMMARY(null,null)
1 kvb call PROC_PROCESSINGSUMMARY(null,null) ],
[ start_time user_host query_time \
0 2020-02-19 07:01:56.411155 lrdba[lrdba] @ localhost [] 00:01:55.299187
1 2020-02-20 07:01:56.005284 lrdba[lrdba] @ localhost [] 00:01:54.210222
db sql_text
0 kvb call PROC_PROCESSINGSUMMARY(null,null)
1 kvb call PROC_PROCESSINGSUMMARY(null,null) ]]
list_data =list_data[0] #when i mention this list_data[0] it show data for first file
list_data =list_data[-1] #for list_data[-1] it show data for last file for below code
pd.concat(list_data,ignore_index=True)
我想知道我应该在 [] 中输入什么值才能在一个大 dataframe 中获取所有文件详细信息。
import glob
import pandas as pd
all_rec = glob.glob('D:\python\*.html')
#print(all_rec)
list_data = []
for filename in all_rec:
data = pd.read_html(filename)
list_data.append(data)
list_data # showing array of all html data
#remove this line
list_data =list_data[0]
#remove this line
list_data =list_data[-1]
pd.DataFrame(list_data).reset_index(drop=True) #replace concat with DataFrame
希望这会奏效!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.