基於子組計算pandas數據幀中每年的出現次數

Question

想象一下由。給出的pandas數據框

df = pd.DataFrame({
    'id': [1, 1, 1, 2, 2],
    'location': [1, 2, 3, 1, 2],
    'date': [pd.to_datetime('01-01-{}'.format(year)) for year in [2015, 2016, 2015, 2017, 2018]]
}).set_index('id')

看起來像這樣

    location       date
id                     
1          1 2015-01-01
1          2 2016-01-01
1          3 2015-01-01
2          1 2017-01-01
2          2 2018-01-01

現在，我想為date列中表示的每年創建一個列，該列按id計算出現的次數。 因此，結果數據框應該是這樣的

    location       date  2015  2016  2017  2018
id                                             
1          1 2015-01-01     2     1     0     0
1          2 2016-01-01     2     1     0     0
1          3 2015-01-01     2     1     0     0
2          1 2017-01-01     0     0     1     1
2          2 2018-01-01     0     0     1     1

現在我想象使用pd.groupby.transform但我無法找出最佳解決方案。

我自己的解決方案是

df['year'] = df['date'].map(lambda x: x.year)
df = pd.merge(
    df, 
    pd.pivot_table(df, 'date', 'id', 'year', 'count').fillna(0).astype(int), 
    left_index=True, right_index=True).drop('year', axis=1)

Answer 1

`get_dummies`

df.join(pd.get_dummies(df.date.dt.year).sum(level=0))

         date  location  2015  2016  2017  2018
id                                             
1  2015-01-01         1     2     1     0     0
1  2016-01-01         2     2     1     0     0
1  2015-01-01         3     2     1     0     0
2  2017-01-01         1     0     0     1     1
2  2018-01-01         2     0     0     1     1

`factorize`

i, r = pd.factorize(df.index)
j, c = pd.factorize(df.date.dt.year)
n, m = shape = len(r), len(c)
b = np.zeros(shape, dtype=np.int64)
np.add.at(b, (i, j), 1)

df.join(pd.DataFrame(b, r, c).rename_axis('id'))

         date  location  2015  2016  2017  2018
id                                             
1  2015-01-01         1     2     1     0     0
1  2016-01-01         2     2     1     0     0
1  2015-01-01         3     2     1     0     0
2  2017-01-01         1     0     0     1     1
2  2018-01-01         2     0     0     1     1

Answer 2

創建輔助DataFrame由groupby與size ， unstack和year ，並join到原來的df ：

df1 = df.join(df.groupby(['id', df['date'].dt.year]).size().unstack(fill_value=0), on='id')
print (df1)
    location       date  2015  2016  2017  2018
id                                             
1          1 2015-01-01     2     1     0     0
1          2 2016-01-01     2     1     0     0
1          3 2015-01-01     2     1     0     0
2          1 2017-01-01     0     0     1     1
2          2 2018-01-01     0     0     1     1

詳情：

print (df.groupby(['id', df['date'].dt.year]).size().unstack(fill_value=0))

date  2015  2016  2017  2018
id                          
1        2     1     0     0
2        0     0     1     1

crosstab另一個解決方案：

df1 = df.join(pd.crosstab(df.index, df['date'].dt.year), on='id')

print (pd.crosstab(df.index, df['date'].dt.year))
date   2015  2016  2017  2018
row_0                        
1         2     1     0     0
2         0     0     1     1

基於子組計算pandas數據幀中每年的出現次數

問題描述

2 個解決方案

解決方案1
4 已采納 2018-09-10 10:59:39

`get_dummies`

`factorize`

解決方案2
3 2018-09-10 10:52:35

基於子組計算pandas數據幀中每年的出現次數

問題描述

2 個解決方案

解決方案1 4 已采納 2018-09-10 10:59:39

get_dummies

factorize

解決方案2 3 2018-09-10 10:52:35

解決方案1
4 已采納 2018-09-10 10:59:39

`get_dummies`

`factorize`

解決方案2
3 2018-09-10 10:52:35