简体   繁体   English

将一个 excel 文件拆分为多个,其中 Pandas 具有特定的行数

[英]Split one excel file into multiple with specific number of rows in Pandas

Let's say I have an excel file with 101 rows, I need to split and write into 11 excel files with equivalent row number 10 for each new file, except the last one since there is only one row left.假设我有一个包含101行的 excel 文件,我需要拆分并写入11 excel 文件,每个新文件的等效行号为10 ,最后一个文件除外,因为只剩下一行。

This is code I have tried, but I get KeyError: 11 :这是我试过的代码,但我得到KeyError: 11

df = pd.DataFrame(data=np.random.rand(101, 3), columns=list('ABC'))
groups = df.groupby(int(len(df.index)/10) + 1)
for i, g in groups:
    g.to_excel("%s.xlsx" % i, index = False, index_lable = False)

Someone could help with this issue?有人可以帮助解决这个问题吗? Thanks a lot.非常感谢。

Reference related: Split pandas dataframe into multiple dataframes with equal numbers of rows参考相关: Split pandas dataframe into multiple dataframes with equal numbers of rows

I think you need np.arange :我认为你需要np.arange

df = pd.DataFrame(data=np.random.rand(101, 3), columns=list('ABC'))
groups = df.groupby(np.arange(len(df.index))//10)
for i, g in groups:
    print (g)

I solved a similar problem as follows.我解决了类似的问题如下。 Backstory to my issue was that I have created an Azure Function with an HTTP trigger, but was overwhelming the endpoint when iterating through 2k rows of requests.我的问题的背景是我创建了一个 Azure Function 和一个 HTTP 触发器,但是在遍历 2k 行请求时压倒了端点。 So chunked up the origin file into rows of 50:所以将原始文件分成 50 行:

import pandas as pd
import logging

INXL = pd.read_excel('split/031022.xlsx', engine="openpyxl")


row_count = (len(INXL.index))
#make sure we are dealing with a table bigger than 50    
if row_count >= 51:
    row_start = (row_count -50)
else:
   row_start = 1


def extract(rs, rc):
   while rc >= 51: #loop body
        # set the extraction to be between the row start and ending index
        row_extract = INXL.iloc[rs:rc]
        with pd.ExcelWriter(f'output_{rc}.xlsx') as writer: 
            row_extract.to_excel(writer,index=False)
        rc -= 50
        rs -= 50
        

extract(row_start, row_count)
if row_count < 51:
    row_extract = INXL.iloc[row_start:row_count]
    with pd.ExcelWriter(f'output_{row_count}.xlsx') as writer: 
        row_extract.to_excel(writer,index=False) 
        logging.info("extract completed")       

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM