简体   繁体   English

Pandas - 按列值将数据框拆分为多个 Excel 工作簿

[英]Pandas - Splitting dataframe into multiple excel workbooks by column value

I'm new to pandas.我是熊猫的新手。 I have a large excel file, what I'm trying to do is split the data frame after manipulation into multiple excel workbooks.我有一个很大的 excel 文件,我想要做的是将操作后的数据框拆分为多个 excel 工作簿。 There is more or less 400 vendors and I would like each To have their own named workbook.大约有 400 个供应商,我希望每个供应商都有自己的命名工作簿。

Example.例子。 SallyCreative.xlsx, JohnWorks.xlsx, AlexGraphics.xlsx SallyCreative.xlsx、JohnWorks.xlsx、AlexGraphics.xlsx

This is my approach to splitting dataframe into multiple excel workbooks by column values.这是我按列值将数据框拆分为多个 excel 工作簿的方法。

import pandas as pd
    
data = pd.read_excel('anyexcelfile.xlsx', engine='openpyxl') # creates a dataframe called 'data'; pick any spreadsheet you can add paths to 'x:/folder/subfolder/anyexcelfile.xlsx' to be explict. 

grouped = data.groupby("Column Header Name") # change "Column Header Name" to the name of the column needed to categorise or group the rows in the dataframe, 

keys = grouped.groups.keys() #create a dictionary list of the each group unique varibles in the specifed column of the dataframe.   

print(keys) #a cheeky debug to check it's working

for key in keys: #looping through each key 
        splitdf = grouped.get_group(key) # creating a temporary dataframe with only the values of the current key. 
        splitdf.to_excel(str(key)+".xlsx", engine='xlsxwriter') #write the temporary dataframe called 'splitdf' to an excel file named after the key. At the end of the loop the temporary dataframe 'splitdf' is overwritten for use with the next key. 

Try the below code, I hope it will help and provide you the required solution.试试下面的代码,我希望它会有所帮助并为您提供所需的解决方案。

Consider I have data like this.考虑我有这样的数据。

    displayName self    created id  field   fromString
0          A    A   2018-12-18  1   status  Backlog
1          B    B   2018-12-18  2   status  Funnel

Now i want to create different excel display name as A.xlsx and B.xlsx.现在我想创建不同的 excel 显示名称作为 A.xlsx 和 B.xlsx。 We do so shown below:我们这样做如下所示:

import pandas as pd
data_df = pd.read_excel('./data_1.xlsx')
grouped_df = data_df.groupby('displayName')

for data in grouped_df.displayName:
    grouped_df.get_group(data[0]).to_excel(data[0]+".xlsx")

This will generate excels for you as per the number of display name in this case.在这种情况下,这将根据显示名称的数量为您生成 excel。 But you can modify solution according to your need.但是您可以根据需要修改解决方案。 Hope this would help.希望这会有所帮助。

As asked in the comment by @Kpittman正如@Kpittman 在评论中所问的那样

We can save in any directory by giving path to that directory.我们可以通过提供该目录的路径来保存在任何目录中。

import pandas as pd
data_df = pd.read_excel('./data_1.xlsx')
grouped_df = data_df.groupby('displayName')

for data in grouped_df.displayName:
    grouped_df.get_group(data[0]).to_excel("./IO/Files/"+data[0]+".xlsx")

So instead of this path ./IO/Files/ you can provide your custom path.因此,您可以提供自定义路径,而不是此路径./IO/Files/

Hope it will help希望它会有所帮助

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM