[英]From pandas to dictionary so that the value in column one will be the key and the corresponding values in column two will all be in a list
I have a very big pandas DataFrame as the following:我有一个非常大的熊猫 DataFrame 如下:
t gid
0 2010.0 67290
1 2020.0 92780
2 2040.0 92780
3 2060.0 92780
4 2090.0 92780
5 2110.0 92780
6 2140.0 92780
7 2190.0 92780
8 2010.0 69110
9 2010.0 78420
10 2020.0 78420
11 2020.0 78420
12 2030.0 78420
13 2040.0 78420
and I want to translate it to a dictionary such that I get:我想把它翻译成字典,这样我就可以得到:
gid_to_t[gid] == list of all t's, gid_to_t[gid] == 所有 t 的列表,
for example - gid_to_t[92778] == [2020,2040,2060,2090,2110...]例如 - gid_to_t[92778] == [2020,2040,2060,2090,2110...]
I know I can do the following:我知道我可以做到以下几点:
gid_to_t = {}
for i,gid in enumerate(list(sps.gid)):
gid_to_t[gid] = list(sps[sps.gid==gid].t)
but it takes too long, and I will be happy to find a faster way.但这需要太长时间,我很乐意找到更快的方法。
Thanks谢谢
EDIT编辑
I've checked the methods suggested in the comments, this is the data: https://drive.google.com/open?id=1d3zUkc543hm8CZ_ZyzAzdbmQUE_G55bU我检查了评论中建议的方法,这是数据: https : //drive.google.com/open?id=1d3zUkc543hm8CZ_ZyzAzdbmQUE_G55bU
import pandas as pd
df1 = pd.read_pickle('stack.pkl')
%timeit -n 2 df1.groupby('gid')['t'].apply(list).to_dict()
2 loops, best of 3: 4.76 s per loop
%timeit -n 2 df1.groupby('gid')['t'].apply(lambda x: x.tolist()).to_dict()
2 loops, best of 3: 4.21 s per loop
%timeit -n 2 df1.groupby('gid', sort=False)['t'].apply(list).to_dict()
2 loops, best of 3: 4.84 s per loop
%timeit -n 2 {name: group.tolist() for name, group in df1.groupby('gid')['t']}
2 loops, best of 3: 4 s per loop
%timeit -n 2 {name: group.tolist() for name, group in df1.groupby('gid', sort=False)['t']}
2 loops, best of 3: 3.96 s per loop
%timeit -n 2 {name: group['t'].tolist() for name, group in df1.groupby('gid', sort=False)}
2 loops, best of 3: 7.16 s per loop
Try create dictionary
by to_dict
from Series
of list
s created by groupby
:尝试从
groupby
创建的list
Series
中的to_dict
创建dictionary
:
#if necessary convert column to int
df.t = df.t.astype(int)
d = df.groupby('gid')['t'].apply(list).to_dict()
print (d)
{92780: [2020, 2040, 2060, 2090, 2110, 2140, 2190],
67290: [2010],
78420: [2010, 2020, 2020, 2030, 2040],
69110: [2010]}
print (d[78420])
[2010, 2020, 2020, 2030, 2040]
If performance is important add sort=False
parameter to groupby
:如果性能很重要,请将
sort=False
参数添加到groupby
:
d = df.groupby('gid', sort=False)['t'].apply(list).to_dict()
d = {name: group.tolist() for name, group in df.groupby('gid', sort=False)['t']}
d = {name: group['t'].tolist() for name, group in df.groupby('gid', sort=False)}
One more answer that doesn't use apply.另一个不使用的答案适用。
d = {name: group.tolist() for name, group in df.groupby('gid')['t']}
{67290: [2010.0],
69110: [2010.0],
78420: [2010.0, 2020.0, 2020.0, 2030.0, 2040.0],
92780: [2020.0, 2040.0, 2060.0, 2090.0, 2110.0, 2140.0, 2190.0]}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.