根据Pandas中给定列中的某个行值将数据集拆分为单独的excel文件？

Question

I have a fairly large dataset that I would like to split into separate excel files based on the names in column A ("Agent" column in the example provided below). 我有一个相当大的数据集，我想根据A列中的名称拆分成单独的excel文件（下面提供的示例中的“Agent”列）。 I've provided a rough example of what this data-set looks like in Ex1 below. 我已经提供了一个粗略的例子，说明这个数据集在下面的Ex1中的样子。

Using pandas, what is the most efficient way to create a new excel file for each of the names in column A, or the Agent column in this example, preferably with the name found in column A used in the file title? 使用pandas，为A列中的每个名称或本示例中的Agent列创建新的Excel文件的最有效方法是什么，最好使用文件标题中使用的A列中的名称？

For example, in the given example, I would like separate files for John Doe, Jane Doe, and Steve Smith containing the information that follows their names (Business Name, Business ID, etc.). 例如，在给定的示例中，我想为John Doe，Jane Doe和Steve Smith分别包含其姓名后面的信息（商家名称，商家ID等）。

Ex1

Agent        Business Name    Business ID    Revenue

John Doe     Bobs Ice Cream   12234          $400
John Doe     Car Repair       445848         $2331
John Doe     Corner Store     243123         $213
John Doe     Cool Taco Stand  2141244        $8912
Jane Doe     Fresh Ice Cream  9271499        $2143
Jane Doe     Breezy Air       0123801        $3412
Steve Smith  Big Golf Range   12938192       $9912
Steve Smith  Iron Gyms        1231233        $4133
Steve Smith  Tims Tires       82489233       $781

I believe python / pandas would be an efficient tool for this, but I'm still fairly new to pandas, so I'm having trouble getting started. 我相信python / pandas对于这个来说是一个有效的工具，但我对熊猫还是比较新的，所以我开始时遇到了麻烦。

Answer 1

Use lise comprehension with groupby on agent column: 在agent列上使用lise理解和groupby ：

dfs = [d for _,d in df.groupby('Agent')]

for df in dfs:
    print(df, '\n')

Output 产量

      Agent    Business Name  Business ID Revenue
4  Jane Doe  Fresh Ice Cream      9271499   $2143
5  Jane Doe       Breezy Air       123801   $3412 

      Agent    Business Name  Business ID Revenue
0  John Doe   Bobs Ice Cream        12234    $400
1  John Doe       Car Repair       445848   $2331
2  John Doe     Corner Store       243123    $213
3  John Doe  Cool Taco Stand      2141244   $8912 

         Agent   Business Name  Business ID Revenue
6  Steve Smith  Big Golf Range     12938192   $9912
7  Steve Smith       Iron Gyms      1231233   $4133
8  Steve Smith      Tims Tires     82489233    $781

Answer 2

I would loop over the groups of names, then save each group to its own excel file: 我会遍历名称组，然后将每个组保存到自己的excel文件中：

s = df.groupby('Agent')

for name, group in s:
    group.to_excel(f"{name}.xls")

Answer 3

Use the unique values in the column to subset the data and write it to csv using the name: 使用列中的唯一值来对数据进行子集化，并使用以下名称将其写入csv：

import pandas as pd
for unique_val in df['Agent'].unique():
    df[df['Agent'] == unique_val].to_csv(f"{unique_val}.csv")

if you need excel: 如果你需要excel：

import pandas as pd
for unique_val in df['Agent'].unique():
    df[df['Agent'] == unique_val].to_excel(f"{unique_val}.xlsx")

Answer 4

Grouping is what you are looking for here. 分组是你在这里寻找的。 You can iterate over the groups, which gives you the grouping attributes and the data associated with that group. 您可以遍历组，这将为您提供分组属性和与该组关联的数据。 In your case, the Agent name and the associated business columns. 在您的情况下，代理名称和关联的业务列。

Code: 码：

import pandas as pd
# make up some data
ex1 = pd.DataFrame([['A',1],['A',2],['B',3],['B',4]], columns = ['letter','number'])

# iterate over the grouped data and export the data frames to excel workbooks
for group_name,data in ex1.groupby('letter'):
    # you probably have more complicated naming logic
    # use index = False if you have not set an index on the dataframe to avoid an extra column of indices
    data.to_excel(group_name + '.xlsx', index = False)

根据Pandas中给定列中的某个行值将数据集拆分为单独的excel文件？

问题描述

4 个解决方案

解决方案1
0 2019-06-03 18:37:14

解决方案2
0 已采纳 2019-06-03 18:37:55

解决方案3
0 2019-06-03 18:38:34

解决方案4
0 2019-06-03 18:42:53

根据Pandas中给定列中的某个行值将数据集拆分为单独的excel文件？

问题描述

4 个解决方案

解决方案1 0 2019-06-03 18:37:14

解决方案2 0 已采纳 2019-06-03 18:37:55

解决方案3 0 2019-06-03 18:38:34

解决方案4 0 2019-06-03 18:42:53

解决方案1
0 2019-06-03 18:37:14

解决方案2
0 已采纳 2019-06-03 18:37:55

解决方案3
0 2019-06-03 18:38:34

解决方案4
0 2019-06-03 18:42:53