[英]Break up a data-set into separate excel files based on a certain row value in a given column in Pandas?
I have a fairly large dataset that I would like to split into separate excel files based on the names in column A ("Agent" column in the example provided below). 我有一个相当大的数据集,我想根据A列中的名称拆分成单独的excel文件(下面提供的示例中的“Agent”列)。 I've provided a rough example of what this data-set looks like in Ex1 below.
我已经提供了一个粗略的例子,说明这个数据集在下面的Ex1中的样子。
Using pandas, what is the most efficient way to create a new excel file for each of the names in column A, or the Agent column in this example, preferably with the name found in column A used in the file title? 使用pandas,为A列中的每个名称或本示例中的Agent列创建新的Excel文件的最有效方法是什么,最好使用文件标题中使用的A列中的名称?
For example, in the given example, I would like separate files for John Doe, Jane Doe, and Steve Smith containing the information that follows their names (Business Name, Business ID, etc.). 例如,在给定的示例中,我想为John Doe,Jane Doe和Steve Smith分别包含其姓名后面的信息(商家名称,商家ID等)。
Ex1
Agent Business Name Business ID Revenue
John Doe Bobs Ice Cream 12234 $400
John Doe Car Repair 445848 $2331
John Doe Corner Store 243123 $213
John Doe Cool Taco Stand 2141244 $8912
Jane Doe Fresh Ice Cream 9271499 $2143
Jane Doe Breezy Air 0123801 $3412
Steve Smith Big Golf Range 12938192 $9912
Steve Smith Iron Gyms 1231233 $4133
Steve Smith Tims Tires 82489233 $781
I believe python / pandas would be an efficient tool for this, but I'm still fairly new to pandas, so I'm having trouble getting started. 我相信python / pandas对于这个来说是一个有效的工具,但我对熊猫还是比较新的,所以我开始时遇到了麻烦。
Use lise comprehension with groupby
on agent
column: 在
agent
列上使用lise理解和groupby
:
dfs = [d for _,d in df.groupby('Agent')]
for df in dfs:
print(df, '\n')
Output 产量
Agent Business Name Business ID Revenue
4 Jane Doe Fresh Ice Cream 9271499 $2143
5 Jane Doe Breezy Air 123801 $3412
Agent Business Name Business ID Revenue
0 John Doe Bobs Ice Cream 12234 $400
1 John Doe Car Repair 445848 $2331
2 John Doe Corner Store 243123 $213
3 John Doe Cool Taco Stand 2141244 $8912
Agent Business Name Business ID Revenue
6 Steve Smith Big Golf Range 12938192 $9912
7 Steve Smith Iron Gyms 1231233 $4133
8 Steve Smith Tims Tires 82489233 $781
I would loop over the groups of names, then save each group to its own excel file: 我会遍历名称组,然后将每个组保存到自己的excel文件中:
s = df.groupby('Agent')
for name, group in s:
group.to_excel(f"{name}.xls")
Use the unique values in the column to subset the data and write it to csv using the name: 使用列中的唯一值来对数据进行子集化,并使用以下名称将其写入csv:
import pandas as pd
for unique_val in df['Agent'].unique():
df[df['Agent'] == unique_val].to_csv(f"{unique_val}.csv")
if you need excel: 如果你需要excel:
import pandas as pd
for unique_val in df['Agent'].unique():
df[df['Agent'] == unique_val].to_excel(f"{unique_val}.xlsx")
Grouping is what you are looking for here. 分组是你在这里寻找的。 You can iterate over the groups, which gives you the grouping attributes and the data associated with that group.
您可以遍历组,这将为您提供分组属性和与该组关联的数据。 In your case, the Agent name and the associated business columns.
在您的情况下,代理名称和关联的业务列。
Code: 码:
import pandas as pd
# make up some data
ex1 = pd.DataFrame([['A',1],['A',2],['B',3],['B',4]], columns = ['letter','number'])
# iterate over the grouped data and export the data frames to excel workbooks
for group_name,data in ex1.groupby('letter'):
# you probably have more complicated naming logic
# use index = False if you have not set an index on the dataframe to avoid an extra column of indices
data.to_excel(group_name + '.xlsx', index = False)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.