使用 pandas 逐塊讀取完整的 excel 文件

Question

在 pandas 中逐塊讀取文件的最快方法是什么：我正在做類似的事情，我也在 stackoverflow 上找到了。 但是，如果我的文件行例如 1000，我將如何跟蹤跳過和跳過頁腳？

# if the file contains 300 rows, this will read the middle 100
df = pd.read_excel('/path/excel.xlsx', skiprows=100, skip_footer=100)

Answer 1

read_excel沒有塊大小參數。 您可以先讀取文件，然后手動拆分：

df = pd.read_excel(file_name) # you have to read the whole file in total first
import numpy as np
chunksize = df.shape[0] // 1000 # set the number to whatever you want
for chunk in np.split(df, chunksize):
    # process the data

不幸的是，對於 Excel，沒有 escaping 讀取 memory 中的整個文件，所以你必須這樣做。

如果你想用skiprows和skipfooter來做這個，你需要知道 df 的大小（你可以先閱讀它）。

df = pd.read_excel('/path/excel.xlsx')
df_size = df.shape[0]
columns = df.columns.values
chunksize = 1000
for i in range(0, df_size - chunksize, chunksiz):
    df_chunk = pd.read_excel('/path/excel.xlsx', skiprows=i, skip_footer= (df_size - chunksize*(i+1)), names=columns)

Answer 2

你可以使用range 。 假設您要在 1000 行 excel 文件中處理 100 行的塊：

total = 1000
chunksize = 100
for skip in range(0, total, chunksize):
    df = pd.read_excel('/path/excel.xlsx', skiprows=skip, nrows=chunksize)
    # process df
    ...

使用 pandas 逐塊讀取完整的 excel 文件

問題描述

2 個解決方案

解決方案1
1 2022-01-12 12:14:21

解決方案2
1 2022-01-12 12:21:34

使用 pandas 逐塊讀取完整的 excel 文件

問題描述

2 個解決方案

解決方案1 1 2022-01-12 12:14:21

解決方案2 1 2022-01-12 12:21:34

解決方案1
1 2022-01-12 12:14:21

解決方案2
1 2022-01-12 12:21:34