简体   繁体   中英

How to iterate through dataframe using df.loc and key column

I have a daraframe that returns data for each OfficeLocation

在此处输入图像描述

How can I split dataframe by each OfficeLocation and insert each piece of data into separate excel spreadsheet.

import pandas
import pyodbc

server = 'MyServer'
db = 'MyDB'

myparams = ['2019-01-01','2019-02-28', None]  # None substitutes NULL in sql
connection_string = pyodbc.connect('DRIVER={SQL Server};server='+server+';DATABASE='+ db+';Trusted_Connection=yes;')
df = pandas.read_sql_query('EXEC PythonTest_Align_RSrptAccountCurrentMunich @EffectiveDateFrom=?,@EffectiveDateTo=?,@ProducerLocationID=?', connection_string, params = myparams)

# sort the daraframe
df.sort_values(by=['OfficeLocation'], axis=0,inplace=True)

# set the index to be this and do not drop 
df.set_index(keys=['OfficeLocation'],drop=False,inplace=True)

# get a list of unique offices
office = df['OfficeLocation'].unique().tolist()

# now we can perform a lookup on a 'view' of the dataframe
SanDiego = df.loc['San Diego']
print(SanDiego)

# how can I iterate through each office and create excel file for each office
df.loc['San Diego'].to_excel((r'\\user\name\Python\SanDIego_Office.xlsx'))

So I need 3 excel spreadsheet with data: SanDiego.xlsx, Vista.xlsx and SanBernardino.xlsx

You can use groupby :

for location, d in df.groupby('OfficeLocation'):
    d.to_excel(f'\\user\name\Python\{location}.xlsx')

How about something as simple as this?

for loc in df["OfficeLocation"].unique():
    save_df = df[df["OfficeLocation"] == loc]
    save_df.to_excel(loc + ".xlsx")

EDIT

I've generated 50,000 rows of data similar to yours.

+---------------+--------------------+----------------+---------------+----------------+-----------------+------------+--------------+
| Policy Number | ProducerLocationId | OfficeLOcation | EffectiveDate | ExpirationDate | TransactionType | BondAmount | GrossPremium |
+---------------+--------------------+----------------+---------------+----------------+-----------------+------------+--------------+
| 7563299       | 8160               | Aldora         | 31/10/2018    | 28/01/2019     | Cancelled       | -61081     | -2372.303665 |
| 6754151       | 3122               | Aucilla        | 04/05/2019    | 15/06/2019     | New Business    | -80151     | -4135.443318 |
| 3121128       | 3230               | Aulander       | 11/10/2018    | 29/12/2018     | New Business    | -67563     | -28394.83428 |
| 911463        | 4041               | Aullville      | 30/11/2018    | 20/02/2019     | New Business    | -47918     | -17840.05749 |
| 5068380       | 3794               | Ava            | 10/01/2019    | 28/03/2019     | Cancelled       | -41094     | -30523.0655  |
| 2174424       | 1263               | Alcan Border   | 18/04/2019    | 10/07/2019     | Cancelled       | -73661     | -5979.278874 |
| 475464        | 9250               | Audubon        | 15/01/2019    | 17/02/2019     | New Business    | -85217     | -64988.83987 |
| 2076075       | 7405               | Alderton       | 20/08/2019    | 26/09/2019     | New Business    | -32335     | -11144.63342 |
| 3645387       | 9357               | Austwell       | 22/10/2018    | 19/12/2018     | Cancelled       | -5065      | -5013.982643 |
| 3316361       | 1335               | Aurora         | 29/09/2018    | 24/12/2018     | New Business    | -13939     | -6333.580641 |
| 1404387       | 2656               | Auburn Hills   | 04/07/2019    | 19/09/2019     | Cancelled       | -12049     | -385.3522259 |
| 6908433       | 1288               | Alcester       | 30/10/2018    | 18/01/2019     | Cancelled       | -56902     | -27341.06181 |
| 9908879       | 6012               | Alexandria     | 20/06/2019    | 21/08/2019     | Cancelled       | -76226     | -12671.06376 |
| 7850879       | 4606               | Avery          | 10/11/2018    | 21/01/2019     | Cancelled       | -54297     | -40619.42718 |
| 8437707       | 4149               | Auxvasse       | 22/09/2019    | 28/10/2019     | Cancelled       | -59584     | -19800.71077 |
| 4260681       | 1889               | Auburndale     | 06/07/2019    | 22/08/2019     | New Business    | -55035     | -18271.5442  |
| 7234116       | 2636               | Alexander      | 14/07/2019    | 31/08/2019     | New Business    | -59319     | -15711.2827  |
| 3721467       | 3765               | Alexander City | 16/10/2018    | 23/12/2018     | Cancelled       | -98431     | -26743.07459 |
| 6859964       | 7035               | Alburtis       | 04/11/2018    | 26/12/2018     | New Business    | -36917     | -11339.9049  |
| 2994719       | 6997               | Aleneva        | 09/02/2019    | 13/04/2019     | New Business    | -55739     | -46323.01608 |
| 7542794       | 8968               | Aullville      | 25/09/2018    | 09/11/2018     | Cancelled       | -44488     | -4554.278674 |
| 1340649       | 7003               | Augusta        | 30/11/2018    | 17/02/2019     | New Business    | -78405     | -71910.93325 |
| 8078558       | 7185               | Alderpoint     | 10/06/2019    | 22/07/2019     | New Business    | -37928     | -29289.29545 |
| 8198811       | 8963               | Alden          | 05/07/2019    | 15/08/2019     | Cancelled       | -97648     | -79946.41222 |
| 2510522       | 5714               | Avella         | 03/09/2019    | 02/11/2019     | New Business    | -16452     | -11230.93829 |
+---------------+--------------------+----------------+---------------+----------------+-----------------+------------+--------------+

And created two functions one using my version and the other using the groupby method.

In case any one was wondering they both perform similarly but the groupby method comes out on top with less variance and a 1 second quicker run time.

def loop_save_unique(df):    
    for loc in df["OfficeLOcation"].unique():
        save_df = df[df["OfficeLOcation"] == loc]
        save_df.to_excel("output\\test1\\" + loc + ".xlsx")
​
def loop_save_groupby(df):
    for location, d in df.groupby('OfficeLOcation'):
        d.to_excel(f'output\\test2\\{location}.xlsx')



%timeit loop_save_unique(df)
12.1 s ± 556 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit loop_save_groupby(df)
11.1 s ± 183 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM