简体   繁体   English

根据列列表中的值对Pandas Dataframe进行切片

[英]Slicing Pandas Dataframe based on a value present in a column which is a list of lists

I have a Pandas Dataframe with a million rows (ids) with one of the columns as a list of lists. 我有一个带有一百万行(id)的熊猫数据框,其中一列作为列表列表。 eg 例如

df = pd.DataFrame({'id' : [1,2,3,4] ,'token_list' : [['a','b','c'],['c','d'],['a','e','f'],['c','f']]}) df = pd.DataFrame({'id':[1,2,3,4],'token_list':[['a','b','c'],['c','d'], ['a','e','f'],['c','f']]})

I want to create a dictionary of all the unique tokens - 'a', 'b', 'c', 'e', 'f' (which i already have as a separate list) as keys and all the ids that each key is associated with. 我想创建一个包含所有唯一标记的字典-'a','b','c','e','f'(我已经作为单独的列表)作为键以及每个键的所有ID与..相联系。 For eg, {'a' : [1,3], 'b': [1], 'c': [1, 2,4]..} and so on. 例如,{'a':[1,3],'b':[1],'c':[1、2,4] ..}等等。

My problem is there are 12000 such tokens, and I do not want to use loops to run through each row in the first frame. 我的问题是有12000个这样的令牌,我不想使用循环来遍历第一帧的每一行。 And is in does not seem to work. 并在似乎不起作用。

Use np.repeat with numpy.concatenate for flattening first and then groupby with list and last to_dict : 使用np.repeatnumpy.concatenate为第一平整,然后groupbylist和最后to_dict

a = np.repeat(df['id'], df['token_list'].str.len())
b = np.concatenate(df['token_list'].values)

d = a.groupby(b).apply(list).to_dict()
print (d)

{'c': [1, 2, 4], 'a': [1, 3], 'b': [1], 'd': [2], 'e': [3], 'f': [3, 4]}

Detail: 详情:

print (a)
0    1
0    1
0    1
1    2
1    2
2    3
2    3
2    3
3    4
3    4
Name: id, dtype: int64

print (b)
['a' 'b' 'c' 'c' 'd' 'a' 'e' 'f' 'c' 'f']
df.set_index('id')['token_list'].\
    apply(pd.Series).stack().reset_index(name='V').\
       groupby('V')['id'].apply(list).to_dict()
Out[359]: {'a': [1, 3], 'b': [1], 'c': [1, 2, 4], 'd': [2], 'e': [3], 'f': [3, 4]}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据存在于一列中的已定义字符串列表过滤 Pandas 数据框 - Filter pandas dataframe based on defined list of strings which is present in one column 需要使用 dataframe 列中存在的列表值创建列表 - Need to create list using list value which are present in dataframe column 如果熊猫数据框列中存在列表值列表,请用另一个熊猫列中的值替换它们 - If list of lists values are present in Pandas dataframe column replace them with values from another Pandas column Python Pandas Dataframe:基于现有列添加新列,其中包含列表列表 - Python Pandas Dataframe: add new column based on existing column, which contains lists of lists 根据当前值更新Pandas数据帧值 - Update Pandas dataframe value based on present value 从两个列表创建 Pandas Dataframe:第 1 列是第一个列表,第 2 列是第二个列表,它是一个嵌套列表 - Create a Pandas Dataframe from two lists: column 1 is first list, column 2 is second list which is a nested list 基于缺失列名切割pandas DataFrame时出错 - Error in slicing pandas DataFrame based on Missing column names 基于csv的Pandas DataFrame切片 - Slicing Pandas DataFrame based on csv 根据日期切片Pandas Dataframe - Slicing based on dates Pandas Dataframe 获取基于另一列的列值,其中包含pandas dataframe中的字符串列表 - get column value based on another column with list of strings in pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM