Break up a data-set into separate excel files based on a certain row value in a given column in Pandas?

Question

I have a fairly large dataset that I would like to split into separate excel files based on the names in column A ("Agent" column in the example provided below). I've provided a rough example of what this data-set looks like in Ex1 below.

Using pandas, what is the most efficient way to create a new excel file for each of the names in column A, or the Agent column in this example, preferably with the name found in column A used in the file title?

For example, in the given example, I would like separate files for John Doe, Jane Doe, and Steve Smith containing the information that follows their names (Business Name, Business ID, etc.).

Ex1

Agent        Business Name    Business ID    Revenue

John Doe     Bobs Ice Cream   12234          $400
John Doe     Car Repair       445848         $2331
John Doe     Corner Store     243123         $213
John Doe     Cool Taco Stand  2141244        $8912
Jane Doe     Fresh Ice Cream  9271499        $2143
Jane Doe     Breezy Air       0123801        $3412
Steve Smith  Big Golf Range   12938192       $9912
Steve Smith  Iron Gyms        1231233        $4133
Steve Smith  Tims Tires       82489233       $781

I believe python / pandas would be an efficient tool for this, but I'm still fairly new to pandas, so I'm having trouble getting started.

Answer 1

Use lise comprehension with groupby on agent column:

dfs = [d for _,d in df.groupby('Agent')]

for df in dfs:
    print(df, '\n')

Output

      Agent    Business Name  Business ID Revenue
4  Jane Doe  Fresh Ice Cream      9271499   $2143
5  Jane Doe       Breezy Air       123801   $3412 

      Agent    Business Name  Business ID Revenue
0  John Doe   Bobs Ice Cream        12234    $400
1  John Doe       Car Repair       445848   $2331
2  John Doe     Corner Store       243123    $213
3  John Doe  Cool Taco Stand      2141244   $8912 

         Agent   Business Name  Business ID Revenue
6  Steve Smith  Big Golf Range     12938192   $9912
7  Steve Smith       Iron Gyms      1231233   $4133
8  Steve Smith      Tims Tires     82489233    $781

Answer 2

I would loop over the groups of names, then save each group to its own excel file:

s = df.groupby('Agent')

for name, group in s:
    group.to_excel(f"{name}.xls")

Answer 3

Use the unique values in the column to subset the data and write it to csv using the name:

import pandas as pd
for unique_val in df['Agent'].unique():
    df[df['Agent'] == unique_val].to_csv(f"{unique_val}.csv")

if you need excel:

import pandas as pd
for unique_val in df['Agent'].unique():
    df[df['Agent'] == unique_val].to_excel(f"{unique_val}.xlsx")

Answer 4

Grouping is what you are looking for here. You can iterate over the groups, which gives you the grouping attributes and the data associated with that group. In your case, the Agent name and the associated business columns.

Code:

import pandas as pd
# make up some data
ex1 = pd.DataFrame([['A',1],['A',2],['B',3],['B',4]], columns = ['letter','number'])

# iterate over the grouped data and export the data frames to excel workbooks
for group_name,data in ex1.groupby('letter'):
    # you probably have more complicated naming logic
    # use index = False if you have not set an index on the dataframe to avoid an extra column of indices
    data.to_excel(group_name + '.xlsx', index = False)

Break up a data-set into separate excel files based on a certain row value in a given column in Pandas?

Question

4 answers

solution1
0 2019-06-03 18:37:14

solution2
0 ACCPTED 2019-06-03 18:37:55

solution3
0 2019-06-03 18:38:34

solution4
0 2019-06-03 18:42:53

Break up a data-set into separate excel files based on a certain row value in a given column in Pandas?

Question

4 answers

solution1 0 2019-06-03 18:37:14

solution2 0 ACCPTED 2019-06-03 18:37:55

solution3 0 2019-06-03 18:38:34

solution4 0 2019-06-03 18:42:53

solution1
0 2019-06-03 18:37:14

solution2
0 ACCPTED 2019-06-03 18:37:55

solution3
0 2019-06-03 18:38:34

solution4
0 2019-06-03 18:42:53