從熊貓到字典，第一列中的值將是鍵，第二列中的相應值都將在列表中

Question

我有一個非常大的熊貓 DataFrame 如下：

        t   gid
0   2010.0  67290
1   2020.0  92780
2   2040.0  92780
3   2060.0  92780
4   2090.0  92780
5   2110.0  92780
6   2140.0  92780
7   2190.0  92780
8   2010.0  69110
9   2010.0  78420
10  2020.0  78420
11  2020.0  78420
12  2030.0  78420
13  2040.0  78420

我想把它翻譯成字典，這樣我就可以得到：

gid_to_t[gid] == 所有 t 的列表，

例如 - gid_to_t[92778] == [2020,2040,2060,2090,2110...]

我知道我可以做到以下幾點：

gid_to_t = {}
for i,gid in enumerate(list(sps.gid)):
    gid_to_t[gid] = list(sps[sps.gid==gid].t)

但這需要太長時間，我很樂意找到更快的方法。

謝謝

編輯

我檢查了評論中建議的方法，這是數據： https : //drive.google.com/open?id=1d3zUkc543hm8CZ_ZyzAzdbmQUE_G55bU

import pandas as pd
df1 = pd.read_pickle('stack.pkl')

%timeit -n 2 df1.groupby('gid')['t'].apply(list).to_dict()
2 loops, best of 3: 4.76 s per loop
%timeit -n 2 df1.groupby('gid')['t'].apply(lambda x: x.tolist()).to_dict()
2 loops, best of 3: 4.21 s per loop
%timeit -n 2 df1.groupby('gid', sort=False)['t'].apply(list).to_dict()
2 loops, best of 3: 4.84 s per loop
%timeit -n 2 {name: group.tolist() for name, group in df1.groupby('gid')['t']}
2 loops, best of 3: 4 s per loop
%timeit -n 2 {name: group.tolist() for name, group in df1.groupby('gid', sort=False)['t']}
2 loops, best of 3: 3.96 s per loop
%timeit -n 2 {name: group['t'].tolist() for name, group in df1.groupby('gid', sort=False)}
2 loops, best of 3: 7.16 s per loop

Answer 1

嘗試從groupby創建的list Series中的to_dict創建dictionary ：

#if necessary convert column to int
df.t = df.t.astype(int)
d = df.groupby('gid')['t'].apply(list).to_dict()
print (d)
{92780: [2020, 2040, 2060, 2090, 2110, 2140, 2190], 
 67290: [2010], 
 78420: [2010, 2020, 2020, 2030, 2040], 
 69110: [2010]}

print (d[78420])
[2010, 2020, 2020, 2030, 2040]

如果性能很重要，請將sort=False參數添加到groupby ：

d = df.groupby('gid', sort=False)['t'].apply(list).to_dict()
d = {name: group.tolist() for name, group in df.groupby('gid', sort=False)['t']}
d = {name: group['t'].tolist() for name, group in df.groupby('gid', sort=False)}

Answer 2

另一個不使用的答案適用。

d = {name: group.tolist() for name, group in df.groupby('gid')['t']}

{67290: [2010.0],
 69110: [2010.0],
 78420: [2010.0, 2020.0, 2020.0, 2030.0, 2040.0],
 92780: [2020.0, 2040.0, 2060.0, 2090.0, 2110.0, 2140.0, 2190.0]}

從熊貓到字典，第一列中的值將是鍵，第二列中的相應值都將在列表中

問題描述

2 個解決方案

解決方案1
3 2017-03-21 19:57:26

解決方案2
1 已采納 2017-03-21 20:07:26

從熊貓到字典，第一列中的值將是鍵，第二列中的相應值都將在列表中

問題描述

2 個解決方案

解決方案1 3 2017-03-21 19:57:26

解決方案2 1 已采納 2017-03-21 20:07:26

解決方案1
3 2017-03-21 19:57:26

解決方案2
1 已采納 2017-03-21 20:07:26