简体   繁体   English

根据Pandas中给定列中的某个行值将数据集拆分为单独的excel文件?

[英]Break up a data-set into separate excel files based on a certain row value in a given column in Pandas?

I have a fairly large dataset that I would like to split into separate excel files based on the names in column A ("Agent" column in the example provided below). 我有一个相当大的数据集,我想根据A列中的名称拆分成单独的excel文件(下面提供的示例中的“Agent”列)。 I've provided a rough example of what this data-set looks like in Ex1 below. 我已经提供了一个粗略的例子,说明这个数据集在下面的Ex1中的样子。

Using pandas, what is the most efficient way to create a new excel file for each of the names in column A, or the Agent column in this example, preferably with the name found in column A used in the file title? 使用pandas,为A列中的每个名称或本示例中的Agent列创建新的Excel文件的最有效方法是什么,最好使用文件标题中使用的A列中的名称?

For example, in the given example, I would like separate files for John Doe, Jane Doe, and Steve Smith containing the information that follows their names (Business Name, Business ID, etc.). 例如,在给定的示例中,我想为John Doe,Jane Doe和Steve Smith分别包含其姓名后面的信息(商家名称,商家ID等)。

Ex1

Agent        Business Name    Business ID    Revenue

John Doe     Bobs Ice Cream   12234          $400
John Doe     Car Repair       445848         $2331
John Doe     Corner Store     243123         $213
John Doe     Cool Taco Stand  2141244        $8912
Jane Doe     Fresh Ice Cream  9271499        $2143
Jane Doe     Breezy Air       0123801        $3412
Steve Smith  Big Golf Range   12938192       $9912
Steve Smith  Iron Gyms        1231233        $4133
Steve Smith  Tims Tires       82489233       $781

I believe python / pandas would be an efficient tool for this, but I'm still fairly new to pandas, so I'm having trouble getting started. 我相信python / pandas对于这个来说是一个有效的工具,但我对熊猫还是比较新的,所以我开始时遇到了麻烦。

Use lise comprehension with groupby on agent column: agent列上使用lise理解和groupby

dfs = [d for _,d in df.groupby('Agent')]

for df in dfs:
    print(df, '\n')

Output 产量

      Agent    Business Name  Business ID Revenue
4  Jane Doe  Fresh Ice Cream      9271499   $2143
5  Jane Doe       Breezy Air       123801   $3412 

      Agent    Business Name  Business ID Revenue
0  John Doe   Bobs Ice Cream        12234    $400
1  John Doe       Car Repair       445848   $2331
2  John Doe     Corner Store       243123    $213
3  John Doe  Cool Taco Stand      2141244   $8912 

         Agent   Business Name  Business ID Revenue
6  Steve Smith  Big Golf Range     12938192   $9912
7  Steve Smith       Iron Gyms      1231233   $4133
8  Steve Smith      Tims Tires     82489233    $781 

I would loop over the groups of names, then save each group to its own excel file: 我会遍历名称组,然后将每个组保存到自己的excel文件中:

s = df.groupby('Agent')

for name, group in s:
    group.to_excel(f"{name}.xls")

Use the unique values in the column to subset the data and write it to csv using the name: 使用列中的唯一值来对数据进行子集化,并使用以下名称将其写入csv:

import pandas as pd
for unique_val in df['Agent'].unique():
    df[df['Agent'] == unique_val].to_csv(f"{unique_val}.csv")

if you need excel: 如果你需要excel:

import pandas as pd
for unique_val in df['Agent'].unique():
    df[df['Agent'] == unique_val].to_excel(f"{unique_val}.xlsx")

Grouping is what you are looking for here. 分组是你在这里寻找的。 You can iterate over the groups, which gives you the grouping attributes and the data associated with that group. 您可以遍历组,这将为您提供分组属性和与该组关联的数据。 In your case, the Agent name and the associated business columns. 在您的情况下,代理名称和关联的业务列。

Code: 码:

import pandas as pd
# make up some data
ex1 = pd.DataFrame([['A',1],['A',2],['B',3],['B',4]], columns = ['letter','number'])

# iterate over the grouped data and export the data frames to excel workbooks
for group_name,data in ex1.groupby('letter'):
    # you probably have more complicated naming logic
    # use index = False if you have not set an index on the dataframe to avoid an extra column of indices
    data.to_excel(group_name + '.xlsx', index = False)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在多个csv文件中查询,以根据熊猫列上的给定条件获取合适的数据集 - Query within multiple csv files for get the suitable data-set based on given conditions on pandas columns Pandas Dataframe:对于给定的行,尝试基于在另一列中查找值来分配特定列中的值 - Pandas Dataframe: for a given row, trying to assign value in a certain column based on a lookup of a value in another column 给定另一列包含特定值的 Pandas 行数 - Pandas row count given another column contains a certain value Pandas 根据匹配的行值和列名设置列值 - Pandas set column value based on matching row value and column name 为某些列Python Pandas的每个值编写多个Excel文件 - Write multiple Excel files for each value of certain column Python Pandas Python pandas 确保基于列值的每一行都有一组数据存在,如果不添加行 - Python pandas to ensure each row based on column value has a set of data present, if not add row 熔化 Pandas Dataframe 并根据其数据类型分离值列 - Melting Pandas Dataframe and separate the value column based on its data type Python Pandas-根据给定的窗口并从特定值开始计算特定列的总和 - Python Pandas- Calculate sum of a certain column based on a given window and starting at a certain value Pandas Dataframe 根据行数设置列值 - Pandas Dataframe set column value based on row count 根据行中的其他列设置熊猫列布尔值 - Set a pandas column Boolean value based on other columns in the row
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM