根据Pandas中给定列中的某个行值将数据集拆分为单独的excel文件？

Question

我有一个相当大的数据集，我想根据A列中的名称拆分成单独的excel文件（下面提供的示例中的“Agent”列）。 我已经提供了一个粗略的例子，说明这个数据集在下面的Ex1中的样子。

使用pandas，为A列中的每个名称或本示例中的Agent列创建新的Excel文件的最有效方法是什么，最好使用文件标题中使用的A列中的名称？

例如，在给定的示例中，我想为John Doe，Jane Doe和Steve Smith分别包含其姓名后面的信息（商家名称，商家ID等）。

Ex1

Agent        Business Name    Business ID    Revenue

John Doe     Bobs Ice Cream   12234          $400
John Doe     Car Repair       445848         $2331
John Doe     Corner Store     243123         $213
John Doe     Cool Taco Stand  2141244        $8912
Jane Doe     Fresh Ice Cream  9271499        $2143
Jane Doe     Breezy Air       0123801        $3412
Steve Smith  Big Golf Range   12938192       $9912
Steve Smith  Iron Gyms        1231233        $4133
Steve Smith  Tims Tires       82489233       $781

我相信python / pandas对于这个来说是一个有效的工具，但我对熊猫还是比较新的，所以我开始时遇到了麻烦。

Answer 1

在agent列上使用lise理解和groupby ：

dfs = [d for _,d in df.groupby('Agent')]

for df in dfs:
    print(df, '\n')

产量

      Agent    Business Name  Business ID Revenue
4  Jane Doe  Fresh Ice Cream      9271499   $2143
5  Jane Doe       Breezy Air       123801   $3412 

      Agent    Business Name  Business ID Revenue
0  John Doe   Bobs Ice Cream        12234    $400
1  John Doe       Car Repair       445848   $2331
2  John Doe     Corner Store       243123    $213
3  John Doe  Cool Taco Stand      2141244   $8912 

         Agent   Business Name  Business ID Revenue
6  Steve Smith  Big Golf Range     12938192   $9912
7  Steve Smith       Iron Gyms      1231233   $4133
8  Steve Smith      Tims Tires     82489233    $781

Answer 2

我会遍历名称组，然后将每个组保存到自己的excel文件中：

s = df.groupby('Agent')

for name, group in s:
    group.to_excel(f"{name}.xls")

Answer 3

使用列中的唯一值来对数据进行子集化，并使用以下名称将其写入csv：

import pandas as pd
for unique_val in df['Agent'].unique():
    df[df['Agent'] == unique_val].to_csv(f"{unique_val}.csv")

如果你需要excel：

import pandas as pd
for unique_val in df['Agent'].unique():
    df[df['Agent'] == unique_val].to_excel(f"{unique_val}.xlsx")

Answer 4

分组是你在这里寻找的。 您可以遍历组，这将为您提供分组属性和与该组关联的数据。 在您的情况下，代理名称和关联的业务列。

码：

import pandas as pd
# make up some data
ex1 = pd.DataFrame([['A',1],['A',2],['B',3],['B',4]], columns = ['letter','number'])

# iterate over the grouped data and export the data frames to excel workbooks
for group_name,data in ex1.groupby('letter'):
    # you probably have more complicated naming logic
    # use index = False if you have not set an index on the dataframe to avoid an extra column of indices
    data.to_excel(group_name + '.xlsx', index = False)

根据Pandas中给定列中的某个行值将数据集拆分为单独的excel文件？

问题描述

4 个解决方案

解决方案1
0 2019-06-03 18:37:14

解决方案2
0 已采纳 2019-06-03 18:37:55

解决方案3
0 2019-06-03 18:38:34

解决方案4
0 2019-06-03 18:42:53

根据Pandas中给定列中的某个行值将数据集拆分为单独的excel文件？

问题描述

4 个解决方案

解决方案1 0 2019-06-03 18:37:14

解决方案2 0 已采纳 2019-06-03 18:37:55

解决方案3 0 2019-06-03 18:38:34

解决方案4 0 2019-06-03 18:42:53

解决方案1
0 2019-06-03 18:37:14

解决方案2
0 已采纳 2019-06-03 18:37:55

解决方案3
0 2019-06-03 18:38:34

解决方案4
0 2019-06-03 18:42:53