简体   繁体   中英

Using python comprehension with dictionary to assign values to pandas dataframe

Say I am trying to build a dataframe to print out like a table for checking sectors:

SectorDescription   SectorCode
0   State Energy Data Systems   SEDS
1   Coal Data   COAL
2   Petroleum Data  PET
3   Natural Gas Data    NG
4   Electricity Data    ELEC
5   Petroleum Imports Data  PET_IMPORTS
6   Short-Term Energy Outlook Data  STEO
7   International Energy Data   INTL
8   Annual Energy Outlook Data  AEO

Right now I have:

QuandlEIASector = {"State Energy Data Systems":"SEDS",
                  "Coal Data":"COAL",
                  "Petroleum Data":"PET",
                  "Natural Gas Data":"NG",
                  "Electricity Data":"ELEC",
                  "Petroleum Imports Data":"PET_IMPORTS",
                  "Short-Term Energy Outlook Data":"STEO",
                  "International Energy Data":"INTL",
                  "Annual Energy Outlook Data":"AEO"}

What I did is to:

QuandlEIASectorList = pd.DataFrame()
QuandlEIASectorList['SectorDescription'] = QuandlEIASector.keys()
QuandlEIASectorList['SectorCode'] = QuandlEIASector.values()
QuandlEIASectorList

But is there anyway quicker with python's comprehension one-liner to assign column values to a pandas dataframe?

Create Series and then convert to DataFrame :

QuandlEIASectorList = (pd.Series(QuandlEIASector)
                         .rename_axis('SectorDescription')
                         .reset_index(name='SectorCode'))

Similar:

QuandlEIASectorList = (pd.Series(QuandlEIASector, name='SectorCode')
                         .rename_axis('SectorDescription')
                         .reset_index())

Your code should be used with DataFrame constructor:

QuandlEIASectorList = pd.DataFrame({'SectorDescription':list(QuandlEIASector.keys()),
                                    'SectorCode': list(QuandlEIASector.values())})

Or:

QuandlEIASectorList = pd.DataFrame(list(QuandlEIASector.items()), 
                                   columns=['SectorDescription','SectorCode'])

Performance for 10k keys:

QuandlEIASector = dict(zip([f'{x} data' for x in np.arange(10000)], 
                           [f'{x} keys' for x in np.arange(10000)]))

In [73]: %%timeit
    ...: QuandlEIASectorList = pd.DataFrame()
    ...: QuandlEIASectorList['SectorDescription'] = QuandlEIASector.keys()
    ...: QuandlEIASectorList['SectorCode'] = QuandlEIASector.values()
    ...: 
5.94 ms ± 52.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [74]: %%timeit
    ...: (pd.Series(QuandlEIASector)
    ...:    .rename_axis('SectorDescription')
    ...:    .reset_index(name='SectorCode'))
    ...:                          
5.37 ms ± 261 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [75]: %%timeit
    ...: (pd.Series(QuandlEIASector, name='SectorCode')
    ...:    .rename_axis('SectorDescription')
    ...:    .reset_index())
    ...:                          
5.34 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [76]: %%timeit
    ...: pd.DataFrame({'SectorDescription':list(QuandlEIASector.keys()),
    ...:               'SectorCode': list(QuandlEIASector.values())})
    ...:                                    
2.26 ms ± 20.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [77]: %%timeit
    ...: pd.DataFrame(list(QuandlEIASector.items()), 
    ...:              columns=['SectorDescription','SectorCode'])
    ...:                                    
3.15 ms ± 38.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM