[英]Using python comprehension with dictionary to assign values to pandas dataframe
假設我正在嘗試構建一個數據框來像檢查扇區的表格一樣打印出來:
SectorDescription SectorCode
0 State Energy Data Systems SEDS
1 Coal Data COAL
2 Petroleum Data PET
3 Natural Gas Data NG
4 Electricity Data ELEC
5 Petroleum Imports Data PET_IMPORTS
6 Short-Term Energy Outlook Data STEO
7 International Energy Data INTL
8 Annual Energy Outlook Data AEO
現在我有:
QuandlEIASector = {"State Energy Data Systems":"SEDS",
"Coal Data":"COAL",
"Petroleum Data":"PET",
"Natural Gas Data":"NG",
"Electricity Data":"ELEC",
"Petroleum Imports Data":"PET_IMPORTS",
"Short-Term Energy Outlook Data":"STEO",
"International Energy Data":"INTL",
"Annual Energy Outlook Data":"AEO"}
我所做的是:
QuandlEIASectorList = pd.DataFrame()
QuandlEIASectorList['SectorDescription'] = QuandlEIASector.keys()
QuandlEIASectorList['SectorCode'] = QuandlEIASector.values()
QuandlEIASectorList
但是無論如何,python 的理解 one-liner 是否可以更快地將列值分配給 Pandas 數據框?
創建Series
,然后轉換為DataFrame
:
QuandlEIASectorList = (pd.Series(QuandlEIASector)
.rename_axis('SectorDescription')
.reset_index(name='SectorCode'))
相似的:
QuandlEIASectorList = (pd.Series(QuandlEIASector, name='SectorCode')
.rename_axis('SectorDescription')
.reset_index())
您的代碼應該與DataFrame
構造函數一起使用:
QuandlEIASectorList = pd.DataFrame({'SectorDescription':list(QuandlEIASector.keys()),
'SectorCode': list(QuandlEIASector.values())})
或者:
QuandlEIASectorList = pd.DataFrame(list(QuandlEIASector.items()),
columns=['SectorDescription','SectorCode'])
10k 鍵的性能:
QuandlEIASector = dict(zip([f'{x} data' for x in np.arange(10000)],
[f'{x} keys' for x in np.arange(10000)]))
In [73]: %%timeit
...: QuandlEIASectorList = pd.DataFrame()
...: QuandlEIASectorList['SectorDescription'] = QuandlEIASector.keys()
...: QuandlEIASectorList['SectorCode'] = QuandlEIASector.values()
...:
5.94 ms ± 52.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [74]: %%timeit
...: (pd.Series(QuandlEIASector)
...: .rename_axis('SectorDescription')
...: .reset_index(name='SectorCode'))
...:
5.37 ms ± 261 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [75]: %%timeit
...: (pd.Series(QuandlEIASector, name='SectorCode')
...: .rename_axis('SectorDescription')
...: .reset_index())
...:
5.34 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [76]: %%timeit
...: pd.DataFrame({'SectorDescription':list(QuandlEIASector.keys()),
...: 'SectorCode': list(QuandlEIASector.values())})
...:
2.26 ms ± 20.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [77]: %%timeit
...: pd.DataFrame(list(QuandlEIASector.items()),
...: columns=['SectorDescription','SectorCode'])
...:
3.15 ms ± 38.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.