根据列列表中的值对Pandas Dataframe进行切片

Question

I have a Pandas Dataframe with a million rows (ids) with one of the columns as a list of lists. 我有一个带有一百万行（id）的熊猫数据框，其中一列作为列表列表。 eg 例如

df = pd.DataFrame({'id' : [1,2,3,4] ,'token_list' : [['a','b','c'],['c','d'],['a','e','f'],['c','f']]}) df = pd.DataFrame（{'id'：[1,2,3,4]，'token_list'：[['a'，'b'，'c']，['c'，'d']， ['a'，'e'，'f']，['c'，'f']]}）

I want to create a dictionary of all the unique tokens - 'a', 'b', 'c', 'e', 'f' (which i already have as a separate list) as keys and all the ids that each key is associated with. 我想创建一个包含所有唯一标记的字典-'a'，'b'，'c'，'e'，'f'（我已经作为单独的列表）作为键以及每个键的所有ID与..相联系。 For eg, {'a' : [1,3], 'b': [1], 'c': [1, 2,4]..} and so on. 例如，{'a'：[1,3]，'b'：[1]，'c'：[1、2,4] ..}等等。

My problem is there are 12000 such tokens, and I do not want to use loops to run through each row in the first frame. 我的问题是有12000个这样的令牌，我不想使用循环来遍历第一帧的每一行。 And is in does not seem to work. 并在似乎不起作用。

Answer 1

Use np.repeat with numpy.concatenate for flattening first and then groupby with list and last to_dict : 使用np.repeat与numpy.concatenate为第一平整，然后groupby与list和最后to_dict ：

a = np.repeat(df['id'], df['token_list'].str.len())
b = np.concatenate(df['token_list'].values)

d = a.groupby(b).apply(list).to_dict()
print (d)

{'c': [1, 2, 4], 'a': [1, 3], 'b': [1], 'd': [2], 'e': [3], 'f': [3, 4]}

Detail: 详情：

print (a)
0    1
0    1
0    1
1    2
1    2
2    3
2    3
2    3
3    4
3    4
Name: id, dtype: int64

print (b)
['a' 'b' 'c' 'c' 'd' 'a' 'e' 'f' 'c' 'f']

Answer 2

df.set_index('id')['token_list'].\
    apply(pd.Series).stack().reset_index(name='V').\
       groupby('V')['id'].apply(list).to_dict()
Out[359]: {'a': [1, 3], 'b': [1], 'c': [1, 2, 4], 'd': [2], 'e': [3], 'f': [3, 4]}

根据列列表中的值对Pandas Dataframe进行切片

问题描述

2 个解决方案

解决方案1
2 2017-11-13 15:24:59

解决方案2
2 2017-11-13 15:28:45

根据列列表中的值对Pandas Dataframe进行切片

问题描述

2 个解决方案

解决方案1 2 2017-11-13 15:24:59

解决方案2 2 2017-11-13 15:28:45

解决方案1
2 2017-11-13 15:24:59

解决方案2
2 2017-11-13 15:28:45