將多個工作簿中的單個Excel工作表捕獲到熊貓數據框中，並將其保存

Question

我需要從多個工作簿中提取一個Excel工作表，並將其保存到數據框，然后再保存該數據框。

我有一個在每個月底生成的電子表格（例如
2019年6月.xlsx，2019年5月.xlsx，2019年4月.xlsx）。
我需要從每個工作簿中獲取一個工作表“ Sheet1”，並將它們轉換為數據幀（df1）。

我想保存此數據框。

令人高興的是，我還想以某種方式在最初的“數據抓取”之后追加下個月的數據。

我對此還比較陌生，所以我沒有取得太大進步。

import os
import glob
import pandas as pd 
import xlrd
import json
import io
import flatten_json

files = glob.glob('/Users/ngove/Documents/Python Scripts/2019/*.xlsx')
dfs={}
 for f in files: 
  dfs[os.path.splitext(os.path.basename(f))[0]] = pd.read_excel(f)

Answer 1

我解釋了您要保存數據框的聲明，就像您要將其保存為合並的Excel文件一樣。 這將合並以xlsx結尾的指定文件夾中的所有文件。

import os
import pandas as pd
from pandas import ExcelWriter

os.chdir("H:/Python/Reports/") #edit this to be your path
path = os.getcwd()
files = os.listdir(path)
files_xlsx = [f for f in files if f[-4:] == 'xlsx']

df = pd.DataFrame()
for f in files_xlsx:
    data = pd.read_excel(f, 'Sheet1')
    df = df.append(data)

writer=ExcelWriter('Combined_Data.xlsx')
df.to_excel(writer,'Sheet1',index=False)
writer.save()

您可以通過將以下一行更改為一行來更新代碼以獲取所有2019年文件：

files_xlsx = [f for f in files if f[-9:] == '2019.xlsx']

我為大多數代碼引用了此問題，並針對xlsx進行了更新，並添加了代碼的文件保存部分

Answer 2

您可以將所有文件拖放到一個目錄（例如當前目錄）中。 然后將所有的excel文件追加到列表中（例如， files_xls ）。 遍歷所有文件，並使用pandas.read_excel獲得相應的數據幀（例如list_frames ）。

在下面，您可以找到一個示例：

import os
import pandas as pd

path = os.getcwd() # get cur dir
files = os.listdir(path) # get all the files in your cur dir 
# get only the xls or xlsm (this depends on you)
files_xls = [f for f in files if (f[-3:] == 'xls' or f[-4:] == 'xlsm')]

df = pd.DataFrame()
list_frames = []

for f in files_xls:
    print("Processing file: %s" %f)
    try:
        # the following will give you the dataframe
        # the fun params depends on your data format
        data = pd.read_excel(f, 'Sheet1', header=0, index_col=None,
                             sep='delimiter', error_bad_lines=False,
                             skip_blank_lines=True, comment=',,')

    except:
        pass
    list_frames.append(data)

# at the end you can concat your data if you want and remove any dublicate
df = pd.concat(list_frames, sort=False).fillna(0)
df = df.drop_duplicates()

# at the end you can save it
writer = pd.ExcelWriter("your_title" + ".xlsx", engine='xlsxwriter')
df.to_excel(writer, sheet_name="Sheets1", index=False)   
writer.save()

我希望這有幫助。

將多個工作簿中的單個Excel工作表捕獲到熊貓數據框中，並將其保存

問題描述

2 個解決方案

解決方案1
0 2019-07-16 20:23:32

解決方案2
0 2019-07-16 20:25:15

將多個工作簿中的單個Excel工作表捕獲到熊貓數據框中，並將其保存

問題描述

2 個解決方案

解決方案1 0 2019-07-16 20:23:32

解決方案2 0 2019-07-16 20:25:15

解決方案1
0 2019-07-16 20:23:32

解決方案2
0 2019-07-16 20:25:15