簡體   English   中英

處理/轉置Pandas Dataframe

[英]Processing/Transposing Pandas Dataframe

我得到了以下pandas數據幀:

Id    Category
1     type 2
1     type 3
1     type 2
2     type 1
2     type 2

我需要處理和轉置上面的數據框到:

Id Category_type_1 Category_type_2 Category_type_3
1          0               2              1
2          1               1              0

感謝是否有人能夠在python中顯示最簡單的代碼編寫方式。

pd.crosstab(df['Id'], df['Category'])
Out: 
Category  type 1  type 2  type 3
Id                              
1              0       2       1
2              1       1       0

我會groupby並使用size

df.groupby(df.columns.tolist()).size().unstack().fillna(0)

在此輸入圖像描述

使用pivot_table

print (df.pivot_table(index='Id', columns='Category', aggfunc=len, fill_value=0))
Category  type 1  type 2  type 3
Id                              
1              0       2       1
2              1       1       0

時間

Small DataFrame - len(df)=5

In [63]: %timeit df.groupby(df.columns.tolist()).size().unstack().fillna(0)
1000 loops, best of 3: 1.33 ms per loop

In [64]: %timeit (df.pivot_table(index='Id', columns='Category', aggfunc=len, fill_value=0))
100 loops, best of 3: 3.77 ms per loop

In [65]: %timeit pd.crosstab(df['Id'], df['Category'])
100 loops, best of 3: 4.82 ms per loop

Large DataFrame - len(df)=5k

df = pd.concat([df]*1000).reset_index(drop=True)

In [59]: %timeit df.groupby(df.columns.tolist()).size().unstack().fillna(0)
1000 loops, best of 3: 1.73 ms per loop

In [60]: %timeit (df.pivot_table(index='Id', columns='Category', aggfunc=len, fill_value=0))
100 loops, best of 3: 4.64 ms per loop

In [61]: %timeit pd.crosstab(df['Id'], df['Category'])
100 loops, best of 3: 5.46 ms per loop

Very large DataFrame - len(df)=5m

df = pd.concat([df]*1000000).reset_index(drop=True)

In [55]: %timeit df.groupby(df.columns.tolist()).size().unstack().fillna(0)
1 loop, best of 3: 514 ms per loop

In [56]: %timeit (df.pivot_table(index='Id', columns='Category', aggfunc=len, fill_value=0))
1 loop, best of 3: 907 ms per loop

In [57]: %timeit pd.crosstab(df['Id'], df['Category'])
1 loop, best of 3: 822 ms per loop

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM