簡體   English   中英

使用帶有字典的python理解為pandas數據框賦值

[英]Using python comprehension with dictionary to assign values to pandas dataframe

假設我正在嘗試構建一個數據框來像檢查扇區的表格一樣打印出來:

SectorDescription   SectorCode
0   State Energy Data Systems   SEDS
1   Coal Data   COAL
2   Petroleum Data  PET
3   Natural Gas Data    NG
4   Electricity Data    ELEC
5   Petroleum Imports Data  PET_IMPORTS
6   Short-Term Energy Outlook Data  STEO
7   International Energy Data   INTL
8   Annual Energy Outlook Data  AEO

現在我有:

QuandlEIASector = {"State Energy Data Systems":"SEDS",
                  "Coal Data":"COAL",
                  "Petroleum Data":"PET",
                  "Natural Gas Data":"NG",
                  "Electricity Data":"ELEC",
                  "Petroleum Imports Data":"PET_IMPORTS",
                  "Short-Term Energy Outlook Data":"STEO",
                  "International Energy Data":"INTL",
                  "Annual Energy Outlook Data":"AEO"}

我所做的是:

QuandlEIASectorList = pd.DataFrame()
QuandlEIASectorList['SectorDescription'] = QuandlEIASector.keys()
QuandlEIASectorList['SectorCode'] = QuandlEIASector.values()
QuandlEIASectorList

但是無論如何,python 的理解 one-liner 是否可以更快地將列值分配給 Pandas 數據框?

創建Series ,然后轉換為DataFrame

QuandlEIASectorList = (pd.Series(QuandlEIASector)
                         .rename_axis('SectorDescription')
                         .reset_index(name='SectorCode'))

相似的:

QuandlEIASectorList = (pd.Series(QuandlEIASector, name='SectorCode')
                         .rename_axis('SectorDescription')
                         .reset_index())

您的代碼應該與DataFrame構造函數一起使用:

QuandlEIASectorList = pd.DataFrame({'SectorDescription':list(QuandlEIASector.keys()),
                                    'SectorCode': list(QuandlEIASector.values())})

或者:

QuandlEIASectorList = pd.DataFrame(list(QuandlEIASector.items()), 
                                   columns=['SectorDescription','SectorCode'])

10k 鍵的性能

QuandlEIASector = dict(zip([f'{x} data' for x in np.arange(10000)], 
                           [f'{x} keys' for x in np.arange(10000)]))

In [73]: %%timeit
    ...: QuandlEIASectorList = pd.DataFrame()
    ...: QuandlEIASectorList['SectorDescription'] = QuandlEIASector.keys()
    ...: QuandlEIASectorList['SectorCode'] = QuandlEIASector.values()
    ...: 
5.94 ms ± 52.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [74]: %%timeit
    ...: (pd.Series(QuandlEIASector)
    ...:    .rename_axis('SectorDescription')
    ...:    .reset_index(name='SectorCode'))
    ...:                          
5.37 ms ± 261 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [75]: %%timeit
    ...: (pd.Series(QuandlEIASector, name='SectorCode')
    ...:    .rename_axis('SectorDescription')
    ...:    .reset_index())
    ...:                          
5.34 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [76]: %%timeit
    ...: pd.DataFrame({'SectorDescription':list(QuandlEIASector.keys()),
    ...:               'SectorCode': list(QuandlEIASector.values())})
    ...:                                    
2.26 ms ± 20.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [77]: %%timeit
    ...: pd.DataFrame(list(QuandlEIASector.items()), 
    ...:              columns=['SectorDescription','SectorCode'])
    ...:                                    
3.15 ms ± 38.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM