简体   繁体   English

将列表中的索引附加到列表列表以创建 pandas df

[英]Attach index from list to a list of lists to create pandas df

I was wondering if it was possible to create a dataframe from a list of lists, where each item in the index_list is attached as an index to each value in lst:我想知道是否可以从列表列表中创建 dataframe,其中 index_list 中的每个项目都作为索引附加到 lst 中的每个值:

index_list = ['phase1', 'phase2', 'phase3']
lst = [['a', 'b', 'c'], ['d', 'e', 'f', 'g'], ['h', 'i', 'j']]

在此处输入图像描述

Thank you for any help!!感谢您的任何帮助!!

Edit: the inner lists are not necessarily the same size.编辑:内部列表的大小不一定相同。

You can use pd.Series.explode here.您可以在此处使用pd.Series.explode

pd.Series(lst,index=index_list).explode()
phase1    a
phase1    b
phase1    c
phase2    d
phase2    e
phase2    f
phase2    g
phase3    h
phase3    i
phase3    j
dtype: object

Another solution using np.repeat and np.concatenate使用np.repeatnp.concatenate另一种解决方案

r_len = [len(r) for r in lst]
pd.Series(np.concatenate(lst), index=np.repeat(index_list,r_len))

phase1    a
phase1    b
phase1    c
phase2    d
phase2    e
phase2    f
phase2    g
phase3    h
phase3    i
phase3    j
dtype: object

Timeit results:时间结果:


In [501]: %%timeit
     ...: pd.Series(lst,index=index_list).explode()
     ...:
     ...:
363 µs ± 16.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [503]: %%timeit
     ...: r_len = [len(r) for r in lst]
     ...: pd.Series(np.concatenate(lst), index=np.repeat(index_list,r_len))
     ...:
     ...:
236 µs ± 17.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

This problem looks similar to R's expand.grid() function and is listed in this pandas cookbook (bottom of the page).这个问题看起来类似于 R 的expand.grid() function 并列在pandas 食谱中(页面底部)。 This function lets you to create dataframe with all combinations of the given input values.这个 function 允许您使用给定输入值的所有组合创建 dataframe。

First define a function:首先定义一个function:

def expand_grid(data_dict):
rows = itertools.product(*data_dict.values())
return pd.DataFrame.from_records(rows, columns=data_dict.keys())

Then you can use it like so:然后你可以像这样使用它:

df = expand_grid({'index': ['phase1', 'phase2', 'phase3'],
'Col1': [['a', 'b', 'c'], ['d', 'e', 'f', 'g'], ['h', 'i', 'j']]})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM